Word frequency: Difference between revisions
Content added Content deleted
Thundergnat (talk | contribs) m (→{{header|Perl 6}}: Add a second, much more capable example that goes way above and beyond.) |
|||
Line 109: | Line 109: | ||
that => 7924 |
that => 7924 |
||
it => 6661</pre> |
it => 6661</pre> |
||
This satisfies the task requirements as they are written, but leaves a lot to be desired. For my own amusement here is a version that recognizes contractions with embedded apostrophes, hyphenated words, and hyphenated words broken across lines. Returns the top N words and counts sorted by length with a secondary sort on frequency just to be different (and to demonstrate that it really does what is claimed.) |
|||
<lang perl6>sub MAIN ($filename, $top = 10) { |
|||
.say for ($filename.IO.slurp.lc.subst(/ (\w '-') \n ( \w ) /, {$0 ~ $1}, :g ) |
|||
~~ m:g/ <[\w]-[_]>+[["'"|'-'|"'-"]<[\w]-[_]>+]* /)».Str.Bag.sort( {-$^a.key.chars, -$a.value} )[^$top]; |
|||
}</lang> |
|||
{{out}} |
|||
Again, passing in the same file name and 10: |
|||
<pre>police-agent-ja-vert-was-found-drowned-un-der-a-boat-of-the-pont-au-change => 1 |
|||
jésus-mon-dieu-bancroche-à-bas-la-lune => 1 |
|||
die-of-hunger-if-you-have-a-fire => 1 |
|||
guimard-guimardini-guimardinette => 1 |
|||
monsieur-i-don't-know-your-name => 1 |
|||
sainte-croix-de-la-bretonnerie => 2 |
|||
die-of-cold-if-you-have-bread => 1 |
|||
petit-picpus-sainte-antoine => 1 |
|||
saint-jacques-du-haut-pas => 7 |
|||
chemin-vert-saint-antoine => 3</pre> |
|||
=={{header|Python}}== |
=={{header|Python}}== |