Word frequency: Difference between revisions
Content deleted Content added
m →version 1: optimized the REXX program. |
Thundergnat (talk | contribs) m →{{header|Perl 6}}: More concise, better output formatting, demonstrate usage as a one-liner |
||
Line 142: | Line 142: | ||
<lang perl6>sub MAIN ($filename, $top = 10) { |
<lang perl6>sub MAIN ($filename, $top = 10) { |
||
. |
.put for $filename.IO.slurp.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^$top] |
||
}</lang> |
}</lang> |
||
{{out}} |
{{out}} |
||
Passing in the file name and 10: |
Passing in the file name and 10: |
||
<pre>the |
<pre>the 41088 |
||
of |
of 19949 |
||
and |
and 14942 |
||
a |
a 14596 |
||
to |
to 13951 |
||
in |
in 11214 |
||
he |
he 9648 |
||
was |
was 8621 |
||
that |
that 7924 |
||
it |
it 6661</pre> |
||
Or, as a one-liner at the command prompt: |
|||
<code>perl6 -e'lines.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^10].join("\n").say' < ./lemiz.txt</code> |
|||
Same output. |
|||
This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.) |
This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.) |
||
<lang perl6>sub MAIN ($filename, $top = 10) { |
<lang perl6>sub MAIN ($filename, $top = 10) { |
||
.say for |
.say for $filename.IO.slurp.lc.subst(/ (<[\w]-[_]>'-')\n(<[\w]-[_]>) /, {$0 ~ $1}, :g )\ |
||
.comb( / <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* / ).Bag.sort( {-$^a.key.chars, -$a.value} )[^$top]; |
|||
}</lang> |
}</lang> |
||
{{out}} |
{{out}} |
||
Again, passing in the same file name and 10: |
Again, passing in the same file name and 10: |
||
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change |
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change 1 |
||
jésus-mon-dieu-bancroche-à-bas-la-lune |
jésus-mon-dieu-bancroche-à-bas-la-lune 1 |
||
die-of-hunger-if-you-have-a-fire |
die-of-hunger-if-you-have-a-fire 1 |
||
guimard-guimardini-guimardinette |
guimard-guimardini-guimardinette 1 |
||
monsieur-i-don't-know-your-name |
monsieur-i-don't-know-your-name 1 |
||
sainte-croix-de-la-bretonnerie |
sainte-croix-de-la-bretonnerie 2 |
||
die-of-cold-if-you-have-bread |
die-of-cold-if-you-have-bread 1 |
||
petit-picpus-sainte-antoine |
petit-picpus-sainte-antoine 1 |
||
saint-jacques-du-haut-pas |
saint-jacques-du-haut-pas 7 |
||
chemin-vert-saint-antoine |
chemin-vert-saint-antoine 3</pre> |
||
=={{header|Python}}== |
=={{header|Python}}== |