Word frequency: Difference between revisions

m
→‎{{header|Perl 6}}: Add a second, much more capable example that goes way above and beyond.
m (→‎{{header|Perl 6}}: Add a second, much more capable example that goes way above and beyond.)
Line 109:
that => 7924
it => 6661</pre>
 
This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.)
 
<lang perl6>sub MAIN ($filename, $top = 10) {
.say for ($filename.IO.slurp.lc.subst(/ (\w '-') \n ( \w ) /, {$0 ~ $1}, :g )
~~ m:g/ <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* /)».Str.Bag.sort( {-$^a.key.chars, -$a.value} )[^$top];
}</lang>
 
{{out}}
Again, passing in the same file name and 10:
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change => 1
jésus-mon-dieu-bancroche-à-bas-la-lune => 1
die-of-hunger-if-you-have-a-fire => 1
guimard-guimardini-guimardinette => 1
monsieur-i-don't-know-your-name => 1
sainte-croix-de-la-bretonnerie => 2
die-of-cold-if-you-have-bread => 1
petit-picpus-sainte-antoine => 1
saint-jacques-du-haut-pas => 7
chemin-vert-saint-antoine => 3</pre>
 
=={{header|Python}}==
10,339

edits