Word frequency: Difference between revisions

Content deleted Content added
m →‎version 1: optimized the REXX program.
Thundergnat (talk | contribs)
m →‎{{header|Perl 6}}: More concise, better output formatting, demonstrate usage as a one-liner
Line 142: Line 142:


<lang perl6>sub MAIN ($filename, $top = 10) {
<lang perl6>sub MAIN ($filename, $top = 10) {
.say for ($filename.IO.slurp.lc ~~ m:g/[<[\w]-[_]>]+/)».Str.Bag.sort(-*.value)[^$top]
.put for $filename.IO.slurp.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^$top]
}</lang>
}</lang>


{{out}}
{{out}}
Passing in the file name and 10:
Passing in the file name and 10:
<pre>the => 41088
<pre>the 41088
of => 19949
of 19949
and => 14942
and 14942
a => 14596
a 14596
to => 13951
to 13951
in => 11214
in 11214
he => 9648
he 9648
was => 8621
was 8621
that => 7924
that 7924
it => 6661</pre>
it 6661</pre>

Or, as a one-liner at the command prompt:

<code>perl6 -e'lines.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^10].join("\n").say' < ./lemiz.txt</code>

Same output.


This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.)
This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.)


<lang perl6>sub MAIN ($filename, $top = 10) {
<lang perl6>sub MAIN ($filename, $top = 10) {
.say for ($filename.IO.slurp.lc.subst(/ (\w '-') \n ( \w ) /, {$0 ~ $1}, :g )
.say for $filename.IO.slurp.lc.subst(/ (<[\w]-[_]>'-')\n(<[\w]-[_]>) /, {$0 ~ $1}, :g )\
~~ m:g/ <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* /)».Str.Bag.sort( {-$^a.key.chars, -$a.value} )[^$top];
.comb( / <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* / ).Bag.sort( {-$^a.key.chars, -$a.value} )[^$top];
}</lang>
}</lang>


{{out}}
{{out}}
Again, passing in the same file name and 10:
Again, passing in the same file name and 10:
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change => 1
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change 1
jésus-mon-dieu-bancroche-à-bas-la-lune => 1
jésus-mon-dieu-bancroche-à-bas-la-lune 1
die-of-hunger-if-you-have-a-fire => 1
die-of-hunger-if-you-have-a-fire 1
guimard-guimardini-guimardinette => 1
guimard-guimardini-guimardinette 1
monsieur-i-don't-know-your-name => 1
monsieur-i-don't-know-your-name 1
sainte-croix-de-la-bretonnerie => 2
sainte-croix-de-la-bretonnerie 2
die-of-cold-if-you-have-bread => 1
die-of-cold-if-you-have-bread 1
petit-picpus-sainte-antoine => 1
petit-picpus-sainte-antoine 1
saint-jacques-du-haut-pas => 7
saint-jacques-du-haut-pas 7
chemin-vert-saint-antoine => 3</pre>
chemin-vert-saint-antoine 3</pre>


=={{header|Python}}==
=={{header|Python}}==