Word frequency: Difference between revisions

Line 142:

<lang perl6>sub MAIN ($filename, $top = 10) {

.~~say~~ for ($filename.IO.slurp.lc ~~~~ m:g~~/[<[\w]-[_]>]+/)~~».Str~~.Bag.sort(-*.value)[^$top]

.put for $filename.IO.slurp.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^$top]

}</lang>

Passing in the file name and 10:

<pre>the => 41088

<pre>the 41088

of => 19949

of 19949

and => 14942

and 14942

a => 14596

a 14596

to => 13951

to 13951

in => 11214

in 11214

he => 9648

he 9648

was => 8621

was 8621

that => 7924

that 7924

it => 6661</pre>

it 6661</pre>

Or, as a one-liner at the command prompt:

<code>perl6 -e'lines.lc.comb( /[<[\w]-[_]>]+/ ).Bag.sort(-*.value)[^10].join("\n").say' < ./lemiz.txt</code>

Same output.

This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.)

<lang perl6>sub MAIN ($filename, $top = 10) {

.say for ($filename.IO.slurp.lc.subst(/ (\w '-') \n ( \w ) /, {$0 ~ $1}, :g )

.say for $filename.IO.slurp.lc.subst(/ (<[\w]-[_]>'-')\n(<[\w]-[_]>) /, {$0 ~ $1}, :g )\

~~ ~~m:g~~/ <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* /)~~».Str~~.Bag.sort( {-$^a.key.chars, -$a.value} )[^$top];

.comb( / <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* / ).Bag.sort( {-$^a.key.chars, -$a.value} )[^$top];

}</lang>

Again, passing in the same file name and 10:

<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change => 1

<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change 1

jésus-mon-dieu-bancroche-à-bas-la-lune => 1

jésus-mon-dieu-bancroche-à-bas-la-lune 1

die-of-hunger-if-you-have-a-fire => 1

die-of-hunger-if-you-have-a-fire 1

guimard-guimardini-guimardinette => 1

guimard-guimardini-guimardinette 1

monsieur-i-don't-know-your-name => 1

monsieur-i-don't-know-your-name 1

sainte-croix-de-la-bretonnerie => 2

sainte-croix-de-la-bretonnerie 2

die-of-cold-if-you-have-bread => 1

die-of-cold-if-you-have-bread 1

petit-picpus-sainte-antoine => 1

petit-picpus-sainte-antoine 1

saint-jacques-du-haut-pas => 7

saint-jacques-du-haut-pas 7

chemin-vert-saint-antoine => 3</pre>

chemin-vert-saint-antoine 3</pre>

=={{header|Python}}==