Jump to content

Inverted index: Difference between revisions

m
→‎{{header|REXX}}: added/changed whitespace and comments, removed OVERFLOW from PRE html STYLE tag.
m (→‎{{header|Perl 6}}: Updating deprecated 'uniq' to 'unique')
m (→‎{{header|REXX}}: added/changed whitespace and comments, removed OVERFLOW from PRE html STYLE tag.)
Line 2,065:
=={{header|REXX}}==
Note: In this algorithm, word indices start at 1.
 
<br><br>Note: the Burma Shave signs were created from 1930 --&gt; 1951.
Note: &nbsp; the Burma Shave signs were created from 1930 ──► 1951 &nbsp; and were common among the rural byways of America.
<lang rexx>/*REXX program illustrates building a simple inverted index & word find.*/
@.='' /*dictionary of words (so far).*/
!='' /*a list of found words (so far).*/
call invertI 90, 'BURMA9BURMA0.TXT' /*read the "file: BURMA0.TXT " 9 ... */
 
call invertI 01, 'BURMA0BURMA1.TXT' /*read file 0 ... " " ~ BURMA1.TXT ...*/
call invertI 12, 'BURMA1BURMA2.TXT' /* " " 1 ... ~ BURMA2.TXT ...*/
call invertI 23, 'BURMA2BURMA3.TXT' /* " " 2 ... ~ BURMA3.TXT ...*/
call invertI 34, 'BURMA3BURMA4.TXT' /* " " 3 ... ~ BURMA4.TXT ...*/
call invertI 45, 'BURMA4BURMA5.TXT' /* " " 4 ... ~ BURMA5.TXT ...*/
call invertI 56, 'BURMA5BURMA6.TXT' /* " " 5 ... ~ BURMA6.TXT ...*/
call invertI 67, 'BURMA6BURMA7.TXT' /* " " 6 ... ~ BURMA7.TXT ...*/
call invertI 78, 'BURMA7BURMA8.TXT' /* " " 7 ... ~ BURMA8.TXT ...*/
call invertI 89, 'BURMA8BURMA9.TXT' /* " " 8 ... ~ BURMA9.TXT ...*/
call invertI 9, 'BURMA9.TXT' /* " " 9 ... */
 
call findAword 'does' /*find a word. */
call findAword '60' /*find another word. */
Line 2,087 ⟶ 2,086:
exit /*stick a fork in it, we're done.*/
/*──────────────────────────────────FINDAWORD subroutine────────────────*/
findAword: procedure expose @.; arg x /*get Aan word,uppercase andversion uppercaseof itX. */
parse arg ox; arg x /*OX=get word;original (as-is) X=value uppercaseof versionX*/
_=@.x; oxo='───'ox"───"
_=@.x
oxo='───'ox"───"
if _=='' then do
say 'word' oxo "not found."
return 0
end
_@=_ /*save _, pass it back to invoker*/
say 'word' oxo "found in:"
do until _==''; parse var _ f w _; say_
say ' file='f ' word='w
end /*until ...··· */
return _@
/*─────────────────────────────────────INVERTI subroutine───────────────*/
invertI: procedure expose @. !; parse arg #,fn /*file#, filename*/
call lineout fn /*close the file, just in case. */
w=0 /*number of words found (so far). */
do while lines(fn)\==0 /* [↓] process the entire file.*/
_=space(linein(fn)) /*read a line, elide extra blanks*/
if xxx_=='' then iterate /*is the word nowif blank (null)record, then ?ignore it*/
say 'file' #", record:" _ /*echo a record (to be verbose).*/
 
do whileuntil lines(fn)\_==0'' /*processpick off words until done. the entire file (below)*/
_=space(linein(fn)) parse upper var _ ? _ /*readpick 1off line,a elideword (uppercased). extra blanks*/
if _= ?=''stripper(?) then iterate /*if blank record,/*strip thenany ignoretrailing itpunctuation.*/
say if ?='file' #",record="_ then iterate /*echois the aword record,now justblank to(null) be? verbose.*/
w=w+1 end /*jbump the word counter (index). */
 
do until _@.?==''@.? # w /*pick off words until done./*append the new word to a list. */
parse upper var _ xxx _ if wordpos(?,!)==0 then !=! ? /*pickadd offto athe wordlist (uppercased).of words found.*/
end /*until ...··· */
xxx=stripper(xxx) /*strip any ending punctuation. */
end /*while lines(fn)¬==0··· */
if xxx='' then iterate /*is the word now blank (null) ? */
w=w+1 say; call lineout fn /*bumpclose the word counter. file, just to be neat*/
@.xxx=@.xxx # w
if wordpos(xxx,!)==0 then !=! xxx /*add to THE list of words found.*/
end /*until ... */
end /*while lines(fn)¬==0*/
 
say; call lineout fn /*close the file, just to be neat*/
return w /*return the index of the word. */
/*─────────────────────────────────────STRIPPER subroutine──────────────*/
stripper: procedure; parse arg q /*remove punctuation at word-end.*/
@punctuation='.,:;?¿!¡∙·'; do j=1 for /*serveral length(@punctuation marks. */)
do j q=1 for lengthstrip(q,'T',substr(@punctuation,j,1))
q=strip(q,'T',substr(@punctuation, end /*j,1))*/
end /*j*/
return q</lang>
'''output'''
<pre style="height:30ex;overflow:scroll50ex">
file 0, record=: Rip a fender
file 0, record=off: Off your carCar
file 0, record=send: Send it in
file 0, record=for: For a half-pound jar
file 0, record=: Burma-shaveShave
 
file 1, record=: A peach
file 1, record=looks: Looks good
file 1, record=with: With lots of fuzz
file 1,record=but arecord: manMan's no peach
file 1, record=and: And never was
file 1, record=: Burma-shaveShave
 
file 2, record=: Does your husband
file 2, record=misbehave: Misbehave
file 2, record=grunt: Grunt and grumble
file 2, record=rant: Rant and rave ?
file 2, record=shoot: Shoot the brute some
file 2, record=: Burma-shaveShave
 
file 3, record=: Don't take a curve
file 3, record=at: At 60 per
file 3, record=we: We hate to lose
file 3, record=a: A customer
file 3, record=: Burma-shaveShave
 
file 4, record=: Every shaver
file 4, record=now: Now can snore
file 4, record=six: Six more minutes
file 4, record=than: Than before
file 4, record=by: By using
file 4, record=: Burma-shaveShave
 
file 5, record=: He played
file 5, record=: a sax
file 5, record=had: Had no B.O.
file 5, record=but: But his whiskers scratched
file 5, record=so: So theyshe let him go
file 5, record=: Burma-shaveShave
 
file 6, record=: Henry the Eighth
file 6, record=: Prince of Friskers
file 6, record=lost: Lost five wives
file 6, record=but: But kept his whiskers
file 6, record=: Burma-shaveShave
 
file 7, record=: Listen, birds
file 7, record=those: These signs cost
file 7, record=money: Money
file 7, record=so: So roost a while but
file 7, record=: But don't get funny
file 7, record=: Burma-shaveShave
 
file 8, record=: My man
file 8, record=won: Won't shave
file 8, record=sez: Sez Hazel Huz
file 8, record=but: But I should worry
file 8, record=: Dora's does
file 8, record=: Burma-shaveShave
 
file 9, record=Past: schoolhousesPast
file 9,record=take itrecord: slowSchoolhouses
file 9, record=let: Take theit littleslow
file 9, record=shavers: Let the little
file 9, record=: Shavers grow
file 9, record=: Burma-shaveShave
 
word ───does─── found in:
file=2 word=1
file=8 word=13
 
word ───60─── found in:
file=3 word=6
 
word ───don't─── found in:
file=3 word=1
file=7 word=12
 
word ───burma-shave─── found in:
file=0 word=14
file=1 word=1715
file=2 word=15
file=3 word=14
file=4 word=13
file=5 word=17
file=6 word=14
file=7 word=15
file=8 word=14
file=9 word=11
</pre>
 
Cookies help us deliver our services. By using our services, you agree to our use of cookies.