Word frequency: Difference between revisions

← Older edit

Content deleted Content added

Revision as of 13:05, 26 October 2020 view source Bartj (talk \| contribs) 483 edits →‎{{header\|Bracmat}}: Added Bracmat solution ← Older edit		Latest revision as of 20:27, 13 June 2024 view source Miks1965 (talk \| contribs) 269 edits PascalABC.NET
(72 intermediate revisions by 33 users not shown)
Line 33: ;References: [http://franklinchen.com/blog/2011/12/08/revisiting-knuth-and-mcilroys-word-count-programs/ McIlroy's program] {{Template:Strings}} <br><br> =={{header\|11l}}== <syntaxhighlight lang="11l">DefaultDict[String, Int] cnt L(word) re:‘\w+’.find_strings(File(‘135-0.txt’).read().lowercase()) cnt[word]++ print(sorted(cnt.items(), key' wordc -> wordc[1], reverse' 1B)[0.<10])</syntaxhighlight> {{out}} <pre> [(the, 41045), (of, 19953), (and, 14939), (a, 14527), (to, 13942), (in, 11210), (he, 9646), (was, 8620), (that, 7922), (it, 6659)] </pre> =={{header\|Ada}}== Line 43 ⟶ 56: {{works with\|Ada\|Ada\|2012}} <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with Ada.Command_Line; with Ada.Text_IO; with Ada.Integer_Text_IO; Line 130 ⟶ 143: end loop; end Word_Frequency; </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 149 ⟶ 162: {{works with\|ALGOL 68G\|Any - tested with release 2.8.3.win32}} Uses the associative array implementations in [[ALGOL_68/prelude]]. <~~lang~~syntaxhighlight lang="algol68"># find the n most common words in a file # # use the associative array in the Associate array/iteration task # # but with integer values # Line 273 ⟶ 286: print( ( whole( top counts[ i ], -6 ), ": ", top words[ i ], newline ) ) OD FI</~~lang~~syntaxhighlight> {{out}} <pre> Line 291 ⟶ 304: 6491: IT </pre> =={{header\|APL}}== {{works with\|GNU APL}} <syntaxhighlight lang="apl"> ⍝⍝ NOTE: input text is assumed to be encoded in ISO-8859-1 ⍝⍝ (The suggested example '135-0.txt' of Les Miserables on ⍝⍝ Project Gutenberg is in UTF-8.) ⍝⍝ ⍝⍝ Use Unix 'iconv' if required ⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝ ∇r ← lowerAndStrip s;stripped;mixedCase ⍝⍝ Convert text to lowercase, punctuation and newlines to spaces stripped ← ' abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' mixedCase ← ⎕av[11],' ,.?!;:"''()[]-ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' r ← stripped[mixedCase ⍳ s] ∇ ⍝⍝ Return the _n_ most frequent words and a count of their occurrences ⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝⍝ ∇r ← n wordCount fname ;D;wl;sidx;swv;pv;wc;uw;sortOrder D ← lowerAndStrip (⎕fio['read_file'] fname) ⍝ raw text with newlines wl ← (~ D ∊ ' ') ⊂ D sidx ← ⍒wl swv ← wl[sidx] pv ← +\ 1,~2 ≡/ swv wc ← ∊ ⍴¨ pv ⊂ pv uw ← 1 ⊃¨ pv ⊂ swv sortOrder ← ⍒wc r ← n↑[2] uw[sortOrder],[0.5]wc[sortOrder] ∇ 5 wordCount '135-0.txt' the of and a to 41042 19952 14938 14526 13942 </syntaxhighlight> =={{header\|AppleScript}}== <~~lang~~syntaxhighlight lang="applescript">(* For simplicity here, words are considered to be uninterrupted sequences of letters and/or digits. The set text is too messy to warrant faffing around with anything more sophisticated. Line 375 ⟶ 424: set filePath to POSIX path of ((path to desktop as text) & "www.rosettacode.org:Word frequency:135-0.txt") set n to 10 return wordFrequency(filePath, n)</~~lang~~syntaxhighlight> {{output}} <~~lang~~syntaxhighlight lang="applescript">"The 10 most frequently occurring words in the file are: The: 41092 Of: 19954 Line 388 ⟶ 437: Was: 8622 That: 7924 It: 6661"</~~lang~~syntaxhighlight> =={{header\|Arturo}}== <syntaxhighlight lang="rebol">findFrequency: function [file, count][ freqs: #[] r: {/[[:alpha:]]+/} loop flatten map split.lines read file 'l -> match lower l r 'word [ if not? key? freqs word -> freqs\[word]: 0 freqs\[word]: freqs\[word] + 1 ] freqs: sort.values.descending freqs result: new [] loop 0..dec count 'x [ 'result ++ @[@[get keys freqs x, get values freqs x]] ] return result ] loop findFrequency "https://www.gutenberg.org/files/135/135-0.txt" 10 'pair [ print pair ]</syntaxhighlight> {{out}} <pre>the 41096 of 19955 and 14939 a 14558 to 13954 in 11218 he 9649 was 8622 that 7924 it 6661</pre> =={{header\|AutoHotkey}}== <~~lang~~syntaxhighlight ~~AutoHotkey~~lang="autohotkey">URLDownloadToFile, http://www.gutenberg.org/files/135/135-0.txt, % A_temp "\tempfile.txt" FileRead, H, % A_temp "\tempfile.txt" FileDelete, % A_temp "\tempfile.txt" Line 407 ⟶ 490: } MsgBox % "Freq`tWord`n" result return</~~lang~~syntaxhighlight> Outputs:<pre>Freq Word 41036 The Line 421 ⟶ 504: =={{header\|AWK}}== <syntaxhighlight lang="awk"> ~~<lang AWK>~~ # syntax: GAWK -f WORD_FREQUENCY.AWK [-v show=x] LES_MISERABLES.TXT # Line 450 ⟶ 533: exit(0) } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 469 ⟶ 552: ==={{header\|QB64}}=== This is a rather long code. I fulfilled the requirement with QB64. It "cleans" each word so it takes as a word anything that begins and ends with a letter. It works with arrays. Amazing the speed of QB64 to do this job with such a big file as Les Miserables.txt. <syntaxhighlight lang="qbasic"> ~~<lang QBASIC>~~ OPTION _EXPLICIT Line 1,037 ⟶ 1,120: END SUB </syntaxhighlight> ~~</lang>~~ {{output}} Line 1,081 ⟶ 1,164: ==={{header\|BaCon}}=== Removing all punctuation, digits, tabs and carriage returns. So "This", "this" and "this." are the same. Full support for UTF8 characters in words. The code itself could be smaller, but for sake of clarity all has been written explicitly. <~~lang~~syntaxhighlight lang="bacon">' We do not count superfluous spaces as words OPTION COLLAPSE TRUE Line 1,104 ⟶ 1,187: FOR i = 0 TO 9 PRINT term$[i], " : ", frequency(term$[i]) NEXT</~~lang~~syntaxhighlight> {{output}} <pre> Line 1,125 ⟶ 1,208: You could cut the length of this down drastically if you didn't need to be able to recall the word at nth position and wished only to display the top 10 words. <~~lang~~syntaxhighlight lang="dos"> @echo off Line 1,171 ⟶ 1,254: goto:eof </syntaxhighlight> ~~</lang>~~ Line 1,204 ⟶ 1,287: <~~lang~~syntaxhighlight lang="bracmat"> ( 10-most-frequent-words = MergeSort { Local variable declarations. } types Line 1,247 ⟶ 1,330: & !most-frequent-words { Return the last 10 terms. } ) & out$(10-most-frequent-words$"135-0.txt") { Call 10-most-frequent-words with name of inout file and print result to screen. }</~~lang~~syntaxhighlight> '''Output''' <pre> (6661.it) Line 1,263 ⟶ 1,346: {{libheader\|GLib}} Words are defined by the regular expression "\w+". <~~lang~~syntaxhighlight lang="c">#include <stdbool.h> #include <stdio.h> #include <glib.h> Line 1,354 ⟶ 1,437: return EXIT_FAILURE; return EXIT_SUCCESS; }</~~lang~~syntaxhighlight> {{out}} Line 1,374 ⟶ 1,457: =={{header\|C sharp\|C#}}== {{trans\|D}} <~~lang~~syntaxhighlight lang="csharp">using System; using System.Collections.Generic; using System.IO; Line 1,406 ⟶ 1,489: } } }</~~lang~~syntaxhighlight> {{out}} <pre>Rank Word Frequency Line 1,422 ⟶ 1,505: =={{header\|C++}}== <syntaxhighlight lang="cpp">#include <algorithm> #include <cstdlib> #include <fstream> #include <iostream> #include <iterator> #include <string> #include <unordered_map> #include <vector> int main(int ac, char** av) { std::ios::sync_with_stdio(false); int head = (ac > 1) ? std::atoi(av[1]) : 10; std::istreambuf_iterator<char> it(std::cin), eof; std::filebuf file; if (ac > 2) { if (file.open(av[2], std::ios::in), file.is_open()) { it = std::istreambuf_iterator<char>(&file); } else return std::cerr << "file " << av[2] << " open failed\n", 1; } auto alpha = [](unsigned c) { return c-'A' < 26 \|\| c-'a' < 26; }; auto lower = [](unsigned c) { return c \| '\x20'; }; std::unordered_map<std::string, int> counts; std::string word; for (; it != eof; ++it) { if (alpha(it)) { word.push_back(lower(it)); } else if (!word.empty()) { ++counts[word]; word.clear(); } } if (!word.empty()) ++counts[word]; // if file ends w/o ws std::vector<std::pair<const std::string,int> const> out; for (auto& count : counts) out.push_back(&count); std::partial_sort(out.begin(), out.size() < head ? out.end() : out.begin() + head, out.end(), [](auto const a, auto const* b) { return a->second > b->second; }); if (out.size() > head) out.resize(head); for (auto const& count : out) { std::cout << count->first << ' ' << count->second << '\n'; } return 0; } </syntaxhighlight> {{out}} <pre> $ ./a.out 10 135-0.txt the 41093 of 19954 and 14943 a 14558 to 13953 in 11219 he 9649 was 8622 that 7924 it 6661 </pre> ===Alternative=== {{trans\|C#}} <~~lang~~syntaxhighlight lang="cpp">#include <algorithm> #include <iostream> #include <fstream> Line 1,432 ⟶ 1,577: int main() { std::regex wordRgx("\\w+"); ~~using namespace std;~~ std::map<std::string, int> freq; ~~regex wordRgx("\\w+");~~ ~~map<~~std::string, ~~int> freq~~line; ~~string~~const ~~line~~int top = 10; std::ifstream in("135-0.txt"); if (!in.is_open()) { std::cerr << "Failed to open file\n"; return 1; } while (std::getline(in, line)) { auto words_itr = std::sregex_iterator(~~line.cbegin(), line.cend(), wordRgx);~~ ~~auto~~ ~~words_end~~ = ~~sregex_iterator~~ line.cbegin(), line.cend(), wordRgx); auto words_end = std::sregex_iterator(); while (words_itr != words_end) { auto match = words_itr; auto word = match.str(); if (word.size() > 0) { transform (word.begin(), word.end(), word.begin(), ::tolower); auto entry = freq.find(word); if (entry != freq.end()) { entry->second++; } else { freq.insert(std::make_pair(word, 1)); } } words_itr = std::next(words_itr); } } in.close(); std::vector<std::pair<std::string, int>> pairs; for (auto iter = freq.cbegin(); iter != freq.cend(); ++iter) { pairs.push_back(iter); } std::sort(pairs.begin(), pairs.end(), [=](~~pair<string, int>~~auto& a, ~~pair<string, int>~~auto& b) { return a.second > b.second; }); std::cout << "Rank Word Frequency\n"; ~~cout~~ << "==== ==== =========\n"; int rank = 1; for (auto iter = pairs.cbegin(); iter != pairs.cend() && rank <= 10top; ++iter) { std::printf("%2d %4s %5d\n", rank++, iter->first.c_str(), iter->second); } return 0; }</~~lang~~syntaxhighlight> {{out}} <pre>Rank Word Frequency Line 1,491 ⟶ 1,638: 9 he 6814 10 had 6139</pre> ===C++20=== {{trans\|C#}} <syntaxhighlight lang="cpp">#include <algorithm> #include <iostream> #include <format> #include <fstream> #include <map> #include <ranges> #include <regex> #include <string> #include <vector> int main() { std::ifstream in("135-0.txt"); std::string text{ std::istreambuf_iterator<char>{in}, std::istreambuf_iterator<char>{} }; in.close(); std::regex word_rx("\\w+"); std::map<std::string, int> freq; for (const auto& a : std::ranges::subrange( std::sregex_iterator{ text.cbegin(),text.cend(), word_rx }, std::sregex_iterator{} )) { auto word = a.str(); transform(word.begin(), word.end(), word.begin(), ::tolower); freq[word]++; } std::vector<std::pair<std::string, int>> pairs; for (const auto& elem : freq) { pairs.push_back(elem); } std::ranges::sort(pairs, std::ranges::greater{}, &std::pair<std::string, int>::second); std::cout << "Rank Word Frequency\n" "==== ==== =========\n"; for (int rank=1; const auto& [word, count] : pairs \| std::views::take(10)) { std::cout << std::format("{:2} {:>4} {:5}\n", rank++, word, count); } }</syntaxhighlight> {{out}} <pre>Rank Word Frequency ==== ==== ========= 0 the 41043 1 of 19952 2 and 14938 3 a 14539 4 to 13942 5 in 11208 6 he 9646 7 was 8620 8 that 7922 9 it 6659</pre> =={{header\|Clojure}}== <~~lang~~syntaxhighlight lang="clojure">(defn count-words [file n] (->> file slurp Line 1,500 ⟶ 1,706: frequencies (sort-by val >) (take n)))</~~lang~~syntaxhighlight> {{Out}} Line 1,510 ⟶ 1,716: =={{header\|COBOL}}== <syntaxhighlight lang="cobol"> ~~<lang COBOL>~~ IDENTIFICATION DIVISION. PROGRAM-ID. WordFrequency. Line 1,724 ⟶ 1,930: CLOSE Word-File Output-File. END-PROGRAM. </syntaxhighlight> ~~</lang>~~ {{Out}} Line 1,747 ⟶ 1,953: =={{header\|Common Lisp}}== <~~lang~~syntaxhighlight lang="lisp"> (defun count-word (n pathname) (with-open-file (s pathname :direction :input) Line 1,768 ⟶ 1,974: (dolist (word words) (incf (gethash word hash 0))) (maphash #'(lambda (e n) (push `(,e . ,n) ac)) hash) ac) </syntaxhighlight> ~~</lang>~~ {{Out}} Line 1,778 ⟶ 1,984: =={{header\|Crystal}}== <~~lang~~syntaxhighlight lang="ruby">require "http/client" require "regex" Line 1,796 ⟶ 2,002: .sort { \|a, b\| b[1] <=> a[1] }[0..9] # sort and get the first 10 elements .each_with_index(1) { \|(word, n), i\| puts "#{i} \t #{word} \t #{n}" } # print the result </syntaxhighlight> ~~</lang>~~ {{out}} Line 1,813 ⟶ 2,019: =={{header\|D}}== <~~lang~~syntaxhighlight Dlang="d">import std.algorithm : sort; import std.array : appender, split; import std.range : take; Line 1,848 ⟶ 2,054: writefln("%4s %-10s %9s", rank++, word.k, word.v); } }</~~lang~~syntaxhighlight> {{out}} Line 1,869 ⟶ 2,075: {{libheader\| System.RegularExpressions}} {{Trans\|C#}} <syntaxhighlight lang="delphi"> ~~<lang Delphi>~~ program Word_frequency; Line 1,942 ⟶ 2,148: readln; end. </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 1,959 ⟶ 2,165: </pre> =={{header\|F Sharp}}== <~~lang~~syntaxhighlight lang="fsharp"> open System.IO open System.Text.RegularExpressions let g=Regex("[A-Za-zÀ-ÿ]+").Matches(File.ReadAllText "135-0.txt") [for n in g do yield n.Value.ToLower()]\|>List.countBy(id)\|>List.sortBy(fun n->(-(snd n)))\|>List.take 10\|>List.iter(fun n->printfn "%A" n) </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 1,981 ⟶ 2,187: =={{header\|Factor}}== This program expects stdin to read from a file via the command line. ( e.g. invoking the program in Windows: <tt>>factor word-count.factor < input.txt</tt> ) The definition of a word here is simply any string surrounded by some combination of spaces, punctuation, or newlines. <~~lang~~syntaxhighlight lang="factor"> USING: ascii io math.statistics prettyprint sequences splitting ; Line 1,988 ⟶ 2,194: lines " " join " .,?!:;()\"-" split harvest [ >lower ] map sorted-histogram <reversed> 10 head . </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 2,003 ⟶ 2,209: { "it" 6532 } } </pre> =={{header\|FreeBASIC}}== <syntaxhighlight lang="freebasic"> #Include "file.bi" type tally as string s as long l end type Sub quicksort(array() As String,begin As Long,Finish As Long) Dim As Long i=begin,j=finish Dim As String x =array(((I+J)\2)) While I <= J While array(I) < X :I+=1:Wend While array(J) > X :J-=1:Wend If I<=J Then Swap array(I),array(J): I+=1:J-=1 Wend If J >begin Then quicksort(array(),begin,J) If I <Finish Then quicksort(array(),I,Finish) End Sub Sub tallysort(array() As tally,begin As Long,Finish As long) Dim As Long i=begin,j=finish Dim As tally x =array(((I+J)\2)) While I <= J While array(I).l > X .l:I+=1:Wend While array(J).l < X .l:J-=1:Wend If I<=J Then Swap array(I),array(J): I+=1:J-=1 Wend If J >begin Then tallysort(array(),begin,J) If I <Finish Then tallysort(array(),I,Finish) End Sub Function loadfile(file As String) As String If Fileexists(file)=0 Then Print file;" not found":Sleep:End Dim As Long f=Freefile Open file For Binary Access Read As #f Dim As String text If Lof(f) > 0 Then text = String(Lof(f), 0) Get #f, , text End If Close #f Return text End Function Function String_Split(s_in As String,chars As String,result() As String) As Long Dim As Long ctr,ctr2,k,n,LC=Len(chars) Dim As boolean tally(Len(s_in)) #macro check_instring() n=0 While n<Lc If chars[n]=s_in[k] Then tally(k)=true If (ctr2-1) Then ctr+=1 ctr2=0 Exit While End If n+=1 Wend #endmacro #macro splice() If tally(k) Then If (ctr2-1) Then ctr+=1:result(ctr)=Mid(s_in,k+2-ctr2,ctr2-1) ctr2=0 End If #endmacro '================== LOOP TWICE ======================= For k =0 To Len(s_in)-1 ctr2+=1:check_instring() Next k If ctr=0 Then If Len(s_in) Andalso Instr(chars,Chr(s_in[0])) Then ctr=1': End If If ctr Then Redim result(1 To ctr): ctr=0:ctr2=0 Else Return 0 For k =0 To Len(s_in)-1 ctr2+=1:splice() Next k '===================== Last one ======================== If ctr2>0 Then Redim Preserve result(1 To ctr+1) result(ctr+1)=Mid(s_in,k+1-ctr2,ctr2) End If Return Ubound(result) End Function Redim As String s() redim as tally t() dim as string p1,p2,deliminators dim as long count,jmp dim as double tm=timer Var L=loadfile("rosettalesmiserables.txt") L=lcase(L) 'get deliminators for n as long=1 to 96 p1+=chr(n) next for n as long=123 to 255 p2+=chr(n) next deliminators=p1+p2 string_split(L,deliminators,s()) quicksort(s(),lbound(s),ubound(s)) For n As Long=lbound(s) To ubound(s)-1 if s(n+1)=s(n) then jmp+=1 if s(n+1)<>s(n) then count+=1 redim preserve t(1 to count) t(count).s=s(n) t(count).l=jmp jmp=0 end if Next tallysort(t(),lbound(t),ubound(t))'sort by frequency print "frequency","word" print for n as long=lbound(t) to lbound(t)+9 print t(n).l,t(n).s next Print print "time for operation ";timer-tm;" seconds" sleep </syntaxhighlight> {{out}} <pre> I saved and reloaded the file as ascii text. frequency word 41098 the 19955 of 14939 and 14557 a 13953 to 11219 in 9648 he 8621 was 7923 that 6660 it time for operation 1.099869600031525 seconds </pre> Line 2,008 ⟶ 2,366: This example shows some of the subtle and non-obvious power of Frink in processing text files in a language-aware and Unicode-aware fashion: * Frink has a Unicode-aware function, <CODE>wordList[''str'']</CODE>, which intelligently enumerates through the words in a string (and correctly handles compound words, hyphenated words, accented characters, etc.) It returns words, spaces, and punctuation marks separately. For the purposes of this program, "words" that do not contain any alphanumeric characters (as decided by the Unicode standard) are filtered out. These are likely punctuation and spaces. There is also a two-argument function, <CODE>wordList[''str'', ''lang'']</CODE> which allows you to specify a language code ''e.g.'' <CODE>"fr"</CODE> to use the rules of French (or many other human languages) to perform correct word-breaking according to the rules of that language! * The file fetched from Project Gutenberg is supposed to be encoded in UTF-8 character encoding, but their servers incorrectly send either that it is Windows-1252 encoded or send no character encoding at all, so this program fixes that. Line 2,021 ⟶ 2,379: There are two sample programs below. First, a simple but powerful method that works in old versions of Frink: <~~lang~~syntaxhighlight lang="frink">d = new dict for w = select[wordList[read[normalizeUnicode["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]]], %r/[[:alnum:]]/ ] d.increment[lc[w], 1] println[join["\n", first[reverse[sort[array[d], {\|a,b\| a@1 <=> b@1}]], 10]]]</~~lang~~syntaxhighlight> {{out}} Line 2,043 ⟶ 2,401: Next, a "showing off" one-liner that works in recent versions of Frink that uses the <CODE>countToArray</CODE> function which easily creates sorted frequency lists and the <CODE>formatTable</CODE> function that formats into a nice table with columns lined up, and still performs full Unicode-aware normalization, capitalization, and word-breaking: <~~lang~~syntaxhighlight lang="frink">formatTable[first[countToArray[select[wordList[lc[normalizeUnicode[read["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]]]], %r/[[:alnum:]]/ ]], 10], "right"]</~~lang~~syntaxhighlight> {{out}} Line 2,057 ⟶ 2,415: he 6812 had 6133 </pre> =={{header\|FutureBasic}}== Task said: "Feel free to explicitly state the thoughts behind the program decisions." Thus the heavy comments. <syntaxhighlight lang="futurebasic"> include "NSLog.incl" local fn WordFrequency( textStr as CFStringRef, caseSensitive as Boolean, ascendingOrder as Boolean ) as CFStringRef '~'1 CFStringRef wrd CFDictionaryRef dict // Depending on the value of the caseSensitive Boolean function parameter above, lowercase incoming text if caseSensitive == NO then textStr = fn StringLowercaseString( textStr ) // Trim non-alphabetic characters from string, and separate individual words with a space CFStringRef tempStr = fn ArrayComponentsJoinedByString( fn StringComponentsSeparatedByCharactersInSet( textStr, fn CharacterSetInvertedSet( fn CharacterSetLetterSet ) ), @" " ) // Prepare separators to parse string into array CFMutableCharacterSetRef separators = fn MutableCharacterSetInit // Informally, this set is the set of all non-whitespace characters used to separate linguistic units in scripts, such as periods, dashes, parentheses, and so on. MutableCharacterSetFormUnionWithCharacterSet( separators, fn CharacterSetPunctuationSet ) // A character set containing all the whitespace and newline characters including characters in Unicode General Category Z, U+000A U+000D, and U+0085. MutableCharacterSetFormUnionWithCharacterSet( separators, fn CharacterSetWhitespaceAndNewlineSet ) // Create array of separated words CFArrayRef tempArr = fn StringComponentsSeparatedByCharactersInSet( tempStr, separators ) // Create a counted set with each word and its frequency CountedSetRef freqencies = fn CountedSetWithArray( tempArr ) // Enumerate each word-frequency pair in the counted set... EnumeratorRef enumRef = fn CountedSetObjectEnumerator( freqencies ) // .. and use it to create array of words in counted set CFArrayRef array = fn EnumeratorAllObjects( enumRef ) // Create an empty mutable array CFMutableArrayRef wordArr = fn MutableArrayWithCapacity( 0 ) // Create word counter NSInteger totalWords = 0 // Enumerate each unique word, get its frequency, create its own key/value pair dictionary, add each dictionary into master array for wrd in array totalWords++ // Create dictionary with frequency and matching word dict = @{ @"count":fn NumberWithUnsignedInteger( fn CountedSetCountForObject( freqencies, wrd ) ), @"object":wrd } // Add each dictionary to the master mutable array, checking for a valid word by length if ( fn StringLength( wrd ) != 0 ) MutableArrayAddObject( wordArr, dict ) end if next // Store the total words as a global application property AppSetProperty( @"totalWords", fn StringWithFormat( @"%d", totalWords - 1 ) ) // Sort the array in ascending or descending order as determined by the ascendingOrder Boolean function input parameter SortDescriptorRef descriptors = fn SortDescriptorWithKey( @"count", ascendingOrder ) CFArrayRef sortedArray = fn ArraySortedArrayUsingDescriptors( wordArr, @[descriptors] ) // Create an empty mutable string CFMutableStringRef mutStr = fn MutableStringWithCapacity( 0 ) // Use each dictionary in sorted array to build the formatted output string NSInteger count = 1 for dict in sortedArray MutableStringAppendString( mutStr, fn StringWithFormat( @"%-7d %-7lu %@\n", count, fn StringIntegerValue( fn DictionaryValueForKey( dict, @"count" ) ), fn DictionaryValueForKey( dict, @"object" ) ) ) count++ next // Create an immutable output string from mutable the string CFStringRef resultStr = fn StringWithFormat( @"%@", mutStr ) end fn = resultStr local fn ParseTextFromWebsite( webSite as CFStringRef ) // Convert incoming string to URL CFURLRef textURL = fn URLWithString( webSite ) // Read contents of URL into a string CFStringRef textStr = fn StringWithContentsOfURL( textURL, NSUTF8StringEncoding, NULL ) // Start timer CFAbsoluteTime startTime = fn CFAbsoluteTimeGetCurrent // Calculate frequency of words in text and sort by occurrence CFStringRef frequencyStr = fn WordFrequency( textStr, NO, NO ) // Log results and post post processing time NSLogClear NSLog( @"%@", frequencyStr ) NSLog( @"Total unique words in document: %@", fn AppProperty( @"totalWords" ) ) // Stop timer and log elapsed processing time NSLog( @"Elapsed time: %f milliseconds.", ( fn CFAbsoluteTimeGetCurrent - startTime ) 1000.0 ) end fn dispatchglobal // Pass url for Les Misérables on Project Gutenberg and parse in background fn ParseTextFromWebsite( @"https://www.gutenberg.org/files/135/135-0.txt" ) dispatchend HandleEvents </syntaxhighlight> {{output}} <pre> 1 41095 the 2 19955 of 3 14939 and 4 14546 a 5 13954 to 6 11218 in 7 9649 he 8 8622 was 9 7924 that 10 6661 it 11 6470 his 12 6193 is //------------------- 22900 1 millstones 22901 1 fumbles 22902 1 shunned 22903 1 avoids 22904 1 poitevin 22905 1 muleteer 22906 1 idolizes 22907 1 lapsed 22908 1 reptitalmus 22909 1 bled 22910 1 isabella Total unique words in document: 22910 Elapsed time: 595.407963 milliseconds. </pre> =={{header\|Go}}== {{trans\|Kotlin}} <~~lang~~syntaxhighlight lang="go">package main import ( Line 2,103 ⟶ 2,594: fmt.Printf("%2d %-4s %5d\n", rank, word, freq) } }</~~lang~~syntaxhighlight> {{out}} Line 2,123 ⟶ 2,614: =={{header\|Groovy}}== Solution: <~~lang~~syntaxhighlight lang="groovy">def topWordCounts = { String content, int n -> def mapCounts = [:] content.toLowerCase().split(/\W+/).each { Line 2,131 ⟶ 2,622: println "Rank Word Frequency\n==== ==== =========" (0..<n).each { printf ("%4d %-4s %9d\n", it+1, top[it].key, top[it].value) } }</~~lang~~syntaxhighlight> Test: <~~lang~~syntaxhighlight lang="groovy">def rawText = "http://www.gutenberg.org/files/135/135-0.txt".toURL().text topWordCounts(rawText, 10)</~~lang~~syntaxhighlight> Output: Line 2,152 ⟶ 2,643: =={{header\|Haskell}}== ===Lazy IO with pure Map, arrows=== {{trans\|Clojure}} <~~lang~~syntaxhighlight ~~Haskell~~lang="haskell">module Main where import Control.Category -- (>>>) import Data.Char -- toLower, isSpace import Data.List -- sortBy, (Foldable(foldl')), filter -- ' import Data.Ord -- Down import System.IO -- stdin, ReadMode, openFile, hClose Line 2,173 ⟶ 2,665: frequencies :: Ord a => [a] -> Map a Integer frequencies = foldl' (\m k -> M.insertWith (+) k 1 m) M.empty -- ' {-# SPECIALIZE frequencies :: [Text] -> Map Text Integer #-} Line 2,193 ⟶ 2,685: >>> take n >>> print) when filep (hClose hand)</~~lang~~syntaxhighlight> {{Out}} <pre> Line 2,200 ⟶ 2,692: </pre> ===Lazy IO, map of IORefs=== Using IORefs as values in the map seems to give a ~2x speedup on large files. The below code is based on https://github.com/composewell/streamly-examples/blob/master/examples/WordFrequency.hs , but still using lazy IO to avoid the extra library dependency (in production you should [https://stackoverflow.com/questions/5892653/whats-so-bad-about-lazy-i-o use a streaming library] like streamly/conduit/io-streams): <syntaxhighlight lang="haskell"> module Main where import Control.Monad (foldM, when) import Data.Char (isSpace, toLower) import Data.List (sortOn, filter) import Data.Ord (Down(..)) import System.IO (stdin, IOMode(..), openFile, hClose) import System.Environment (getArgs) import Data.IORef (IORef(..), newIORef, readIORef, modifyIORef') -- ' -- containers import Data.HashMap.Strict (HashMap) import qualified Data.HashMap.Strict as M -- text import Data.Text (Text) import qualified Data.Text as T import qualified Data.Text.IO as T frequencies :: [Text] -> IO (HashMap Text (IORef Int)) frequencies = foldM (flip (M.alterF alter)) M.empty where alter Nothing = Just <$> newIORef (1 :: Int) alter (Just ref) = modifyIORef' ref (+ 1) >> return (Just ref) -- ' main :: IO () main = do args <- getArgs when (length args /= 1) (error "expecting 1 arg (number of words to print)") let maxw = read $ head args -- no error handling, to simplify the example T.hGetContents stdin >>= \contents -> do freqtable <- frequencies $ filter (not . T.null) $ T.split isSpace $ T.map toLower contents counts <- let readRef (w, ref) = do cnt <- readIORef ref return (w, cnt) in mapM readRef $ M.toList freqtable print $ take maxw $ sortOn (Down . snd) counts </syntaxhighlight> {{Out}} <pre> $ ./word_count 10 < ~/doc/les_miserables* [("the",40378),("of",19869),("and",14468),("a",14278),("to",13590),("in",11025),("he",9213),("was",8347),("that",7249),("his",6414)] </pre> ===Lazy IO, short code, but not streaming=== ~~Or, perhaps a little more simply:~~ Or, perhaps a little more simply, though not streaming (will read everything into memory, don't use on big files): ~~<lang haskell>import qualified Data.Text.IO as T~~ <syntaxhighlight lang="haskell">import qualified Data.Text.IO as T import qualified Data.Text as T Line 2,214 ⟶ 2,754: main :: IO () main = T.readFile "miserables.txt" >>= (mapM_ print . take 10 . frequentWords)</~~lang~~syntaxhighlight> {{Out}} <pre>(40370,"the") Line 2,272 ⟶ 2,812: =={{header\|Java}}== This is relatively simple in Java.<br /> I used a ''URL'' class to download the content, a ''BufferedReader'' class to examine the text line-for-line, a ''Pattern'' and ''Matcher'' to identify words, and a ''Map'' to hold to values. <syntaxhighlight lang="java"> import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.URI; import java.net.URISyntaxException; import java.net.URL; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.regex.Matcher; import java.util.regex.Pattern; </syntaxhighlight> <syntaxhighlight lang="java"> void printWordFrequency() throws URISyntaxException, IOException { URL url = new URI("https://www.gutenberg.org/files/135/135-0.txt").toURL(); try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) { Pattern pattern = Pattern.compile("(\\w+)"); Matcher matcher; String line; String word; Map<String, Integer> map = new HashMap<>(); while ((line = reader.readLine()) != null) { matcher = pattern.matcher(line); while (matcher.find()) { word = matcher.group().toLowerCase(); if (map.containsKey(word)) { map.put(word, map.get(word) + 1); } else { map.put(word, 1); } } } /* print out top 10 / List<Map.Entry<String, Integer>> list = new ArrayList<>(map.entrySet()); list.sort(Map.Entry.comparingByValue()); Collections.reverse(list); int count = 1; for (Map.Entry<String, Integer> value : list) { System.out.printf("%-20s%,7d%n", value.getKey(), value.getValue()); if (count++ == 10) break; } } } </syntaxhighlight> <pre> the 41,043 of 19,952 and 14,938 a 14,539 to 13,942 in 11,208 he 9,646 was 8,620 that 7,922 it 6,659 </pre> <br /> An alternate demonstration {{trans\|Kotlin}} <~~lang~~syntaxhighlight ~~Java~~lang="java">import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; Line 2,315 ⟶ 2,919: } } }</~~lang~~syntaxhighlight> {{out}} <pre>Rank Word Frequency Line 2,329 ⟶ 2,933: 9 that 7924 10 it 6661</pre> =={{header\|jq}}== The following solution uses the concept of a "bag of words" (bow), here realized as a JSON object with the words as keys and the frequency of a word as the corresponding value. To avoid issues with case folding, the "letters" here just the alphabet and hyphen, but a "word" may not begin with hyphen. Thus "the-the" would count as one word, and "-the" would be excluded. <syntaxhighlight lang="jq"> < 135-0.txt jq -nR --argjson n 10 ' def bow(stream): reduce stream as $word ({}; .[($word\|tostring)] += 1); bow(inputs \| gsub("[^-a-zA-Z]"; " ") \| splits(" ") \| ascii_downcase \| select(test("^[a-z][-a-z]$"))) \| to_entries \| sort_by(.value) \| .[- $n :] \| reverse \| from_entries ' </syntaxhighlight> ====Output==== <syntaxhighlight lang="jq"> { "the": 41087, "of": 19937, "and": 14932, "a": 14552, "to": 13738, "in": 11209, "he": 9649, "was": 8621, "that": 7923, "it": 6661 } </syntaxhighlight> =={{header\|Julia}}== {{works with\|Julia\|1.0}} <~~lang~~syntaxhighlight lang="julia"> using FreqTables Line 2,338 ⟶ 2,978: words = split(replace(txt, r"\P{L}"i => " ")) table = sort(freqtable(words); rev=true) println(table[1:10])</~~lang~~syntaxhighlight> {{out}} Line 2,353 ⟶ 2,993: "he" │ 6816 "had" │ 6140</pre> =={{header\|K}}== {{works with\|ngn/k}}<syntaxhighlight lang=K>common:{+((!d)o)!n@o:x#>n:#'.d:=("&"\`c$"&"\|_,/0:y)^,""} {(,'!x),'.x}common[10;"135-0.txt"] (("the";41019) ("of";19898) ("and";14658) (,"a";14517) ("to";13695) ("in";11134) ("he";9405) ("was";8361) ("that";7592) ("his";6446))</syntaxhighlight> (The relatively easy to read output format here is arguably less useful than the table produced by <code>common</code> but it would have been more concise to have <code>common</code> generate it directly.) =={{header\|KAP}}== The below program defines the function 'stats' which accepts a filename containing the text. <syntaxhighlight lang="kap">∇ stats (file) { content ← "[\\h,.\"'\n-]+" regex:split unicode:toLower io:readFile file sorted ← (⍋⊇⊢) content selection ← 1,2≢/sorted words ← selection / sorted {⍵[10↑⍒⍵[;1];]} words ,[0.5] ≢¨ sorted ⊂⍨ +\selection }</syntaxhighlight> {{out}} <pre>┏━━━━━━━━━━━━┓ ┃ "the" 40387┃ ┃ "of" 19913┃ ┃ "and" 14742┃ ┃ "a" 14289┃ ┃ "to" 13819┃ ┃ "in" 11088┃ ┃ "he" 9430┃ ┃ "was" 8597┃ ┃"that" 7516┃ ┃ "his" 6435┃ ┗━━━━━━━━━━━━┛</pre> =={{header\|Kotlin}}== Line 2,360 ⟶ 3,040: There is no change in the results if the numerals 0-9 are also regarded as letters. <~~lang~~syntaxhighlight lang="scala">// version 1.1.3 import java.io.File Line 2,378 ⟶ 3,058: for ((word, freq) in wordGroups) System.out.printf("%2d %-4s %5d\n", rank++, word, freq) }</~~lang~~syntaxhighlight> {{out}} Line 2,397 ⟶ 3,077: =={{header\|Liberty BASIC}}== <~~lang~~syntaxhighlight lang="lb">dim words$(100000,2)'words$(a,1)=the word, words$(a,2)=the count dim lines$(150000) open "135-0.txt" for input as #txt Line 2,463 ⟶ 3,143: close #txt end </syntaxhighlight> ~~</lang>~~ {{out}} <pre>Count Word Line 2,484 ⟶ 3,164: =={{header\|Lua}}== {{works with\|lua\|5.3}} <~~lang~~syntaxhighlight lang="lua"> -- This program takes two optional command line arguments. The first (arg[1]) -- specifies the input file;, ifor itdefaults ~~is absent, then~~to standard input. is ~~used.~~The second -- ~~The second~~ (arg[2]) ~~refers to~~specifies the number of results to show;, ifor itdefaults to is10. ~~-- absent, default to the top 10 words.~~ -- in freq, each key is a word and each value is its count local freq = {} for line in io.lines(arg[1]) do -- %a stands for any letter ~~local lowerline = string.lower(line)~~ for word in string.gmatch(~~lowerline~~string.lower(line), "%a+") do if not freq[word] then freq[word] = 1 Line 2,502 ⟶ 3,182: end -- in array, each entry is an array whose first value is the count and whose -- second value is the word local array = {} for word, count in pairs(freq) do table.insert(array, {~~word~~count, ~~count~~word}) end table.sort(array, function (a, b) return a[1] > b[1] end) ~~table.sort(array, function (a, b) return a[2] > b[2] end)~~ for i = 1, arg[2] or 10 do io.write(string.format('%7d %s\n', array[i][21] , array[i][12])) end </syntaxhighlight> ~~</lang>~~ {{Out}} Line 2,527 ⟶ 3,208: 7924 that 6661 it </pre> Relevant documentation: [https://www.lua.org/manual/5.3/manual.html#pdf-io.lines io.lines] [https://www.lua.org/manual/5.3/manual.html#pdf-string.gmatch gmatch] [https://www.lua.org/manual/5.3/manual.html#6.4.1 patterns like %a] =={{header\|Mathematica}} / {{header\|Wolfram Language}}== <syntaxhighlight lang="mathematica">TakeLargest[10]@WordCounts[Import["https://www.gutenberg.org/files/135/135-0.txt"], IgnoreCase->True]//Dataset</syntaxhighlight> {{out}} <pre> the 41088 of 19936 and 14931 a 14536 to 13738 in 11208 he 9607 was 8621 that 7825 it 6535 </pre> =={{header\|MATLAB}} / {{header\|Octave}}== <syntaxhighlight lang="matlab"> function [result,count] = word_frequency() URL='https://www.gutenberg.org/files/135/135-0.txt'; text=webread(URL); DELIMITER={' ', ',', ';', ':', '.', '/', '', '!', '?', '<', '>', '(', ')', '[', ']','{', '}', '&','$','§','"','”','“','-','—','‘','\t','\n','\r'}; words = sort(strsplit(lower(text),DELIMITER)); flag = [find(~strcmp(words(1:end-1),words(2:end))),length(words)]; dwords = words(flag); % get distinct words, and ... count = diff([0,flag]); % ... the corresponding occurance frequency [tmp,idx] = sort(-count); % sort according to occurance result = dwords(idx); count = count(idx); for k = 1:10, fprintf(1,'%d\t%s\n',count(k),result{k}) end </syntaxhighlight> {{out}} <pre> 41039 the 19950 of 14942 and 14523 a 13941 to 11208 in 9605 he 8620 was 7824 that 6533 it </pre> =={{header\|Nim}}== <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import tables, strutils, sequtils, httpclient proc take[T](s: openArray[T], n: int): seq[T] = s[0 ..< min(n, s.len)] Line 2,540 ⟶ 3,274: wordFrequencies.sort for (word, count) in toSeq(wordFrequencies.pairs).take(10): echo alignLeft($count, 8), word</~~lang~~syntaxhighlight> {{out}} <pre>~~40372~~40377 the ~~19868~~19870 of ~~14472~~14469 and 14278 a ~~13589~~13590 to ~~11024~~11025 in 9213 he 8347 was ~~7250~~7249 that 6414 his</pre> =={{header\|Objeck}}== <~~lang~~syntaxhighlight lang="objeck">use System.IO.File; use Collection; use RegEx; Line 2,606 ⟶ 3,340: }; } }</~~lang~~syntaxhighlight> Output: Line 2,626 ⟶ 3,360: =={{header\|OCaml}}== <~~lang~~syntaxhighlight lang="ocaml">let () = let n = try int_of_string Sys.argv.(1) Line 2,652 ⟶ 3,386: List.iter (fun (word, count) -> Printf.printf "%d %s\n" count word ) r</~~lang~~syntaxhighlight> {{out}} Line 2,667 ⟶ 3,401: 7924 that 6661 it </pre> =={{header\|PascalABC.NET}}== <syntaxhighlight lang="delphi"> ## ReadAllText('135-0.txt').ToLower.MatchValues('\w+').EachCount .OrderByDescending(w -> w.Value).Take(10).PrintLines </syntaxhighlight> {{out}} <pre> (the,41042) (of,19952) (and,14938) (a,14527) (to,13942) (in,11208) (he,9646) (was,8620) (that,7922) (it,6659) </pre> =={{header\|Perl}}== {{trans\|Raku}} <syntaxhighlight lang ="perl">~~$top =~~use 10strict; use warnings; use utf8; my $top = 10; ~~open $fh, "<", '135-0.txt';~~ ~~($text = join '', <$fh>) =~ tr/A-Z/a-z/;~~ open my $fh, '<', 'ref/word-count.txt'; ~~@matcher = (~~ (my $text = join '', <$fh>) =~ tr/A-Z/a-z/; my @matcher = ( qr/[a-z]+/, # simple 7-bit ASCII qr/\w+/, # word characters with underscore Line 2,682 ⟶ 3,440: ); for my $reg (@matcher) { print "\nTop $top using regex: " . $reg ~~. "~~\n"; my @matches = $text =~ /$reg/g; my %words; for my $w (@matches) { $words{$w}++ }; my $c = 0; for my $w ( sort { $words{$b} <=> $words{$a} } keys %words ) { printf "%-7s %6d\n", $w, $words{$w}; last if ++$c >= $top; } }</~~lang~~syntaxhighlight> {{out}} <pre> ~~<pre>Top 10 using regex: (?^:[a-z]+)~~ Top 10 using regex: (?^:[a-z]+) the 41089 of 19949 Line 2,729 ⟶ 3,488: was 8621 that 7924 it 6661~~</pre>~~ </pre> =={{header\|Phix}}== <!--<syntaxhighlight lang="phix">(notonline)--> ~~<lang Phix>?"loading..."~~ <span style="color: #008080;">without</span> <span style="color: #008080;">javascript_semantics</span> ~~constant subs = "\t\r\n_.,\"\'!;:?][()\|=<>#/{}+@%&$",~~ <span style="color: #0000FF;">?</span><span style="color: #008000;">"loading..."</span> ~~reps = repeat(' ',length(subs)),~~ <span style="color: #008080;">constant</span> <span style="color: #000000;">subs</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">'\t'</span><span style="color: #0000FF;">&</span><span style="color: #008000;">"\r\n_.,\"\'!;:?][()\|=<>#/{}+@%&$"</span><span style="color: #0000FF;">,</span> ~~fn = open("135-0.txt","r")~~ <span style="color: #000000;">reps</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">' '</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">subs</span><span style="color: #0000FF;">)),</span> ~~string text = lower(substitute_all(get_text(fn),subs,reps))~~ <span style="color: #000000;">fn</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"135-0.txt"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"r"</span><span style="color: #0000FF;">)</span> ~~close(fn)~~ <span style="color: #004080;">string</span> <span style="color: #000000;">text</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">substitute_all</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">get_text</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">),</span><span style="color: #000000;">subs</span><span style="color: #0000FF;">,</span><span style="color: #000000;">reps</span><span style="color: #0000FF;">))</span> ~~sequence words = append(sort(split(text,no_empty:=true)),"")~~ <span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span> ~~constant wf = new_dict()~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">words</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #000000;">text</span><span style="color: #0000FF;">,</span><span style="color: #000000;">no_empty</span><span style="color: #0000FF;">:=</span><span style="color: #004600;">true</span><span style="color: #0000FF;">)),</span><span style="color: #008000;">""</span><span style="color: #0000FF;">)</span> ~~string last = words[1]~~ <span style="color: #008080;">constant</span> <span style="color: #000000;">wf</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">new_dict</span><span style="color: #0000FF;">()</span> ~~integer count = 1~~ <span style="color: #004080;">string</span> <span style="color: #000000;">last</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">words</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> ~~for i=2 to length(words) do~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">count</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> ~~if words[i]!=last then~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">words</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~setd({count,last},0,wf)~~ <span style="color: #008080;">if</span> <span style="color: #000000;">words</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]!=</span><span style="color: #000000;">last</span> <span style="color: #008080;">then</span> ~~count = 0~~ <span style="color: #7060A8;">setd</span><span style="color: #0000FF;">({</span><span style="color: #000000;">count</span><span style="color: #0000FF;">,</span><span style="color: #000000;">last</span><span style="color: #0000FF;">},</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">wf</span><span style="color: #0000FF;">)</span> ~~last = words[i]~~ <span style="color: #000000;">count</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span> ~~end if~~ <span style="color: #000000;">last</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">words</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> ~~count += 1~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end for~~ <span style="color: #000000;">count</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~count = 10~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> ~~function visitor(object key, object /data/, object /user_data/)~~ <span style="color: #000000;">count</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">10</span> ~~?key~~ <span style="color: #008080;">function</span> <span style="color: #000000;">visitor</span><span style="color: #0000FF;">(</span><span style="color: #004080;">object</span> <span style="color: #000000;">key</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">object</span> <span style="color: #000080;font-style:italic;">/data/</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">object</span> <span style="color: #000080;font-style:italic;">/user_data/</span><span style="color: #0000FF;">)</span> ~~count -= 1~~ <span style="color: #0000FF;">?</span><span style="color: #000000;">key</span> ~~return count>0~~ <span style="color: #000000;">count</span> <span style="color: #0000FF;">-=</span> <span style="color: #000000;">1</span> ~~end function~~ <span style="color: #008080;">return</span> <span style="color: #000000;">count</span><span style="color: #0000FF;">></span><span style="color: #000000;">0</span> ~~traverse_dict(routine_id("visitor"),0,wf,true)</lang>~~ <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> <span style="color: #7060A8;">traverse_dict</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">routine_id</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"visitor"</span><span style="color: #0000FF;">),</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">wf</span><span style="color: #0000FF;">,</span><span style="color: #004600;">true</span><span style="color: #0000FF;">)</span> <!--</syntaxhighlight>--> {{out}} <pre> Line 2,773 ⟶ 3,536: =={{header\|Phixmonti}}== <~~lang~~syntaxhighlight ~~Phixmonti~~lang="phixmonti">include ..\Utilitys.pmt "loading..." ? Line 2,808 ⟶ 3,571: -1 * get ? endfor drop</~~lang~~syntaxhighlight> {{out}} <pre>loading... Line 2,827 ⟶ 3,590: =={{header\|PHP}}== <~~lang~~syntaxhighlight lang="php"> <?php Line 2,842 ⟶ 3,605: } $i++; }</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,858 ⟶ 3,621: 10 had 6139 </pre> =={{header\|Picat}}== To get the book proper, the header and footer are removed. Here are some tests with different sets of characters to split the words (<code>split_char/1</code>). <syntaxhighlight lang="picat">main => NTop = 10, File = "les_miserables.txt", Chars = read_file_chars(File), % Remove the Project Gutenberg header/footer find(Chars,"* START OF THE PROJECT GUTENBERG EBOOK LES MISÉRABLES ",_,HeaderEnd), find(Chars," END OF THE PROJECT GUTENBERG EBOOK LES MISÉRABLES ",FooterStart,_), Book = [to_lowercase(C) : C in slice(Chars,HeaderEnd+1,FooterStart-1)], % Split into words (different set of split characters) member(SplitType,[all,space_punct,space]), println(split_type=SplitType), split_chars(SplitType,SplitChars), Words = split(Book,SplitChars), println(freq(Words).to_list.sort_down(2).take(NTop)), nl, fail. freq(L) = Freq => Freq = new_map(), foreach(E in L) Freq.put(E,Freq.get(E,0)+1) end. % different set of split chars split_chars(all,"\n\r \t,;!.?()[]”\"-“—-__‘’"). split_chars(space_punct,"\n\r \t,;!.?"). split_chars(space,"\n\r \t").</syntaxhighlight> {{out}} <pre>split_type = all [the = 40907,of = 19830,and = 14872,a = 14487,to = 13872,in = 11157,he = 9645,was = 8618,that = 7908,it = 6626] split_type = space_punct [the = 40193,of = 19779,and = 14668,a = 14227,to = 13538,in = 11033,he = 9455,was = 8604,that = 7576,” = 6578] split_type = space [the = 40193,of = 19747,and = 14402,a = 14222,to = 13512,in = 10964,he = 9211,was = 8345,that = 7235,his = 6414]</pre> It is a slightly different result if the the header/footer are not removed: <pre>split_type = all [the = 41094,of = 19952,and = 14939,a = 14545,to = 13954,in = 11218,he = 9647,was = 8620,that = 7922,it = 6641] split_type = space_punct [the = 40378,of = 19901,and = 14734,a = 14284,to = 13620,in = 11094,he = 9457,was = 8606,that = 7590,” = 6578] split_type = space [the = 40378,of = 19869,and = 14468,a = 14278,to = 13590,in = 11025,he = 9213,was = 8347,that = 7249,his = 6414]</pre> =={{header\|PicoLisp}}== <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(setq Delim " ^I^J^M-_.,\"'[]?!&@#$%^\(\):;") (setq Skip (chop Delim)) Line 2,874 ⟶ 3,692: (if (idx 'B W T) (inc (car @)) (set W 1)) ) ) ) (for L (head 10 (flip (by val sort (idx 'B)))) (println L (val L)) )</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,891 ⟶ 3,709: =={{header\|Prolog}}== {{works with\|SWI Prolog}} <~~lang~~syntaxhighlight lang="prolog">print_top_words(File, N):- read_file_to_string(File, String, [encoding(utf8)]), re_split("\\w+", String, Words), Line 2,923 ⟶ 3,741: main:- print_top_words("135-0.txt", 10).</~~lang~~syntaxhighlight> {{out}} Line 2,942 ⟶ 3,760: =={{header\|PureBasic}}== <~~lang~~syntaxhighlight ~~PureBasic~~lang="purebasic">EnableExplicit Structure wordcount Line 2,996 ⟶ 3,814: EndIf End</~~lang~~syntaxhighlight> {{out}} <pre> Line 3,017 ⟶ 3,835: ===Collections=== ====Python2.7==== <~~lang~~syntaxhighlight lang="python">import collections import re import string Line 3,027 ⟶ 3,845: if __name__ == "__main__": main()</~~lang~~syntaxhighlight> {{Out}} Line 3,037 ⟶ 3,855: ====Python3.6==== <~~lang~~syntaxhighlight lang="python">from collections import Counter from re import findall Line 3,056 ⟶ 3,874: if __name__ == "__main__": n = int(input('How many?: ')) most_common_words_in_file(les_mis_file, n)</~~lang~~syntaxhighlight> {{Out}} Line 3,074 ⟶ 3,892: ===Sorted and groupby=== {{Works with\|Python\|3.7}} <~~lang~~syntaxhighlight lang="python">""" Word count task from Rosetta Code http://www.rosettacode.org/wiki/Word_count#Python Line 3,121 ⟶ 3,939: if __name__ == '__main__': main() </syntaxhighlight> ~~</lang>~~ {{Out}} <pre>('the', 40372) Line 3,133 ⟶ 3,951: ('that', 7250) ('his', 6414)</pre> ===Collections, Sorted and Lambda=== <syntaxhighlight lang="python"> #!/usr/bin/python3 import collections import re count = 10 with open("135-0.txt") as f: text = f.read() word_freq = sorted( collections.Counter(sorted(re.split(r"\W+", text.lower()))).items(), key=lambda c: c[1], reverse=True, ) for i in range(len(word_freq)): print("[{:2d}] {:>10} : {}".format(i + 1, word_freq[i][0], word_freq[i][1])) if i == count - 1: break </syntaxhighlight> {{Out}} <pre>[ 1] the : 41039 [ 2] of : 19951 [ 3] and : 14942 [ 4] a : 14527 [ 5] to : 13941 [ 6] in : 11209 [ 7] he : 9646 [ 8] was : 8620 [ 9] that : 7922 [10] it : 6659</pre> =={{header\|R}}== ===Version 1=== I chose to remove apostrophes only if they're followed by an s (so "mom" and "mom's" will show up as the same word but "they" and "they're" won't). I also chose not to remove hyphens. <syntaxhighlight lang="r"> ~~<lang R>~~ wordcount<-function(file,n){ punctuation=c("`","~","!","@","#","$","%","^","&","","(",")","_","+","=","{","[","}","]","\|","\\",":",";","\"","<",",",">",".","?","/","'s") Line 3,152 ⟶ 4,005: return(df[1:n,]) } </syntaxhighlight> ~~</lang>~~ {{Out}} <pre> Line 3,168 ⟶ 4,021: 9 it 2308 10 i 1845 </pre> ===Version 2=== This version is purely functional using the native pipe operator in R 4.1+ and runs in less than a second. <syntaxhighlight lang="r"> word_frequency_pipeline <- function(file=NULL, n=10) { file \|> vroom::vroom_lines() \|> stringi::stri_split_boundaries(type="word", skip_word_none=T, skip_word_number=T) \|> unlist() \|> tolower() \|> table() \|> sort(decreasing = T) \|> (\(.) .[1:n])() \|> data.frame() } </syntaxhighlight> {{Out}} <pre> > word_frequency_pipeline("~/../Downloads/135-0.txt") Var1 Freq 1 the 41042 2 of 19952 3 and 14938 4 a 14526 5 to 13942 6 in 11208 7 he 9605 8 was 8620 9 that 7824 10 it 6533 </pre> =={{header\|Racket}}== <~~lang~~syntaxhighlight lang="racket">#lang racket (define (all-words f (case-fold string-downcase)) Line 3,181 ⟶ 4,067: (module+ main (take (counts (all-words "data/les-mis.txt")) 10))</~~lang~~syntaxhighlight> {{out}} Line 3,197 ⟶ 4,083: =={{header\|Raku}}== (formerly Perl 6) {{works with\|Rakudo\|~~2020~~2022.~~08.1~~07}} Note: much of the following exposition is no longer critical to the task as the requirements have been updated, but is left here for historical and informational reasons. This is slightly trickier than it appears initially. The task specifically states: "A word is a sequence of one or more contiguous letters", so contractions and hyphenated words are broken up. Initially we might reach for a regex matcher like /\w+/ , but \w includes underscore, which is not a letter but a punctuation connector; and this text is '''full''' of underscores since that is how Project Gutenberg texts denote italicized text. The underscores are not actually parts of the words though, they are markup. We might try /A-Za-z/ as a matcher but this text is bursting with French words containing various ~~accented glyphs~~[[wp:diacritic\|diacritic]]s. Those '''are''' letters, so words will be incorrectly split up; (Misérables will be counted as 'mis' and 'rables', probably not what we want.) Actually, in this case /A-Za-z/ returns '''very nearly''' the correct answer. Unfortunately, the name "Alèthe" appears once (only once!) in the text, gets incorrectly split into Al & the, and incorrectly reports 41089 occurrences of "the". Line 3,210 ⟶ 4,096: Here is a sample that shows the result when using various different matchers. <syntaxhighlight lang="raku" ~~perl6~~line>sub MAIN ($filename, UInt $top = 10) { my $file = $filename.IO.slurp.lc.subst(/ (<[\w]-[_]>'-')\n(<[\w]-[_]>) /, {$0 ~ $1}, :g ); my @matcher = ( rx/ <[a..z]>+ /, # simple 7-bit ASCII rx/ \w+ /, # word characters with underscore rx/ <[\w]-[_]>+ /, # word characters without underscore rx/ [<[\w]-[_]>+~~[["'"\|~~]+ % < ' -~~'\|"~~ '-~~"]<[\w]-[_]~~ >+] / # word characters without underscore but with hyphens and contractions ); for @matcher -> $reg { say "\nTop $top using regex: ", $reg.raku; my @words ~~.put for~~= $file.comb( $reg ).Bag.sort(-.value)[^$top]; my $length = max @words».key».chars; printf "%-{$length}s %d\n", .key, .value for @words; } }</~~lang~~syntaxhighlight> {{out}} Line 3,402 ⟶ 4,290: Since REXX doesn't support UTF-8 encodings, code was added to this REXX version to support the accented letters in the mandated input file. <~~lang~~syntaxhighlight lang="rexx">/REXX pgm displays top 10 words in a file (includes foreign letters), case is ignored./ parse arg fID top . /obtain optional arguments from the CL/ if fID=='' \| fID=="," then fID= 'les_mes.txt' /None specified? Then use the default./ Line 3,457 ⟶ 4,345: end /#/ say commas(totW) ' words found ('commas(c) "unique) in " commas(#), ' records read from file: ' fID; say; return</~~lang~~syntaxhighlight> {{out\|output\|text=  when using the default inputs:}} <pre> Line 3,480 ⟶ 4,368: Inspired by version 1 and adapted for ooRexx. It ignores all characters other than a-z and A-Z (which are translated to a-z). <syntaxhighlight lang="text">/REXX program reads and displays a count of words a file. Word case is ignored./ Call time 'R' abc='abcdefghijklmnopqrstuvwxyz' Line 3,530 ⟶ 4,418: tops=tops+words(tl) /correctly handle the tied rankings. / end Say time('E') 'seconds elapsed'</~~lang~~syntaxhighlight> {{out}} <pre>We found 22820 different words Line 3,548 ⟶ 4,436: =={{header\|Ring}}== <~~lang~~syntaxhighlight lang="ring"> # project : Word count Line 3,607 ⟶ 4,495: b = temp return [a, b] </syntaxhighlight> ~~</lang>~~ Output: <pre> Line 3,623 ⟶ 4,511: =={{header\|Ruby}}== <~~lang~~syntaxhighlight lang="ruby"> class String def wc Line 3,633 ⟶ 4,521: open('135-0.txt') { \|n\| n.read.wc[-10,10].each{\|n\| puts n[0].to_s+"->"+n[1].to_s} } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 3,649 ⟶ 4,537: ===Tally and max_by=== {{Works with\|Ruby\|2.7}} <~~lang~~syntaxhighlight lang="ruby">RE = /[[:alpha:]]+/ count = open("135-0.txt").read.downcase.scan(RE).tally.max_by(10, &:last) count.each{\|ar\| puts ar.join("->") } </syntaxhighlight> ~~</lang>~~ {{out}} <pre>the->41092 Line 3,664 ⟶ 4,552: that->7924 it->6661 </pre> ===Chain of Enumerables=== <syntaxhighlight lang="ruby">wf = File.read("135-0.txt", :encoding => "UTF-8") .downcase .scan(/\w+/) .each_with_object(Hash.new(0)) { \|word, hash\| hash[word] += 1 } .sort_by { \|k, v\| v } .reverse .take(10) .each_with_index { \|w, i\| printf "[%2d] %10s : %d\n", i += 1, w[0], w[1] } </syntaxhighlight> {{out}} <pre>[ 1] the : 41040 [ 2] of : 19951 [ 3] and : 14942 [ 4] a : 14539 [ 5] to : 13941 [ 6] in : 11209 [ 7] he : 9646 [ 8] was : 8620 [ 9] that : 7922 [10] it : 6659 </pre> =={{header\|Rust}}== <~~lang~~syntaxhighlight ~~Rust~~lang="rust">use std::cmp::Reverse; use std::collections::HashMap; use std::fs::File; Line 3,698 ⟶ 4,613: fn main() { word_count(File::open("135-0.txt").expect("File open error"), 10) }</~~lang~~syntaxhighlight> {{out}} Line 3,718 ⟶ 4,633: {{Out}} Best seen running in your browser [https://scastie.scala-lang.org/EP2Fm6HXQrC1DwtSNvnUzQ Scastie (remote JVM)]. <~~lang~~syntaxhighlight ~~Scala~~lang="scala">import scala.io.Source object WordCount extends App { Line 3,741 ⟶ 4,656: println(s"\nSuccessfully completed without errors. [total ${scala.compat.Platform.currentTime - executionStart} ms]") }</~~lang~~syntaxhighlight> {{out}} <pre>Rank Word Frequency Line 3,765 ⟶ 4,680: to get words from a fle. The words are [http://seed7.sourceforge.net/libraries/string.htm#lower(in_string) converted to lower case], to assure that "The" and "the" are considered the same. <~~lang~~syntaxhighlight lang="seed7">$ include "seed7_05.s7i"; include "gethttp.s7i"; include "strifile.s7i"; Line 3,806 ⟶ 4,721: end for; end for; end func;</~~lang~~syntaxhighlight> {{out}} Line 3,824 ⟶ 4,739: =={{header\|Sidef}}== <~~lang~~syntaxhighlight lang="ruby">var count = Hash() var file = File(ARGV[0] \\ '135-0.txt') Line 3,837 ⟶ 4,752: top.each { \|pair\| say "#{pair.key}\t-> #{pair.value}" }</~~lang~~syntaxhighlight> {{out}} <pre> Line 3,853 ⟶ 4,768: =={{header\|Simula}}== <~~lang~~syntaxhighlight lang="simula">COMMENT COMPILE WITH $ cim -m64 word-count.sim ; Line 4,132 ⟶ 5,047: END </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 4,148 ⟶ 5,063: 6 garbage collection(s) in 0.2 seconds. </pre> =={{header\|Smalltalk}}== The ASCII text file is from https://www.gutenberg.org/files/135/old/lesms10.txt. ===Cuis Smalltalk, ASCII=== {{works with\|Cuis\|6.0}} <syntaxhighlight lang="smalltalk"> (StandardFileStream new open: 'lesms10.txt' forWrite: false) contents asLowercase substrings asBag sortedCounts first: 10. </syntaxhighlight> {{Out}}<pre>an OrderedCollection(40543 -> 'the' 19796 -> 'of' 14448 -> 'and' 14380 -> 'a' 13582 -> 'to' 11006 -> 'in' 9221 -> 'he' 8351 -> 'was' 7258 -> 'that' 6420 -> 'his') </pre> ===Squeak Smalltalk, ASCII=== {{works with\|Squeak\|6.0}} <syntaxhighlight lang="smalltalk"> (StandardFileStream readOnlyFileNamed: 'lesms10.txt') contents asLowercase substrings asBag sortedCounts first: 10. </syntaxhighlight> {{Out}}<pre>{40543->'the' . 19796->'of' . 14448->'and' . 14380->'a' . 13582->'to' . 11006->'in' . 9221->'he' . 8351->'was' . 7258->'that' . 6420->'his'} </pre> =={{header\|Swift}}== <~~lang~~syntaxhighlight lang="swift">import Foundation func printTopWords(path: String, count: Int) throws { Line 4,158 ⟶ 5,092: // split text into words, convert to lowercase and store word counts in dict let regex = try NSRegularExpression(pattern: "\\w+") ~~for match in~~ regex.~~matches~~enumerateMatches(in: text, range: NSRange(text.startIndex..., in: text)) { (match, _, _) in guard let match = match else { return } let word = String(text[Range(match.range, in: text)!]).lowercased() ifdict[word, ~~let~~default: n0] += ~~dict[word] {~~1 ~~dict[word] = n + 1~~ ~~} else {~~ ~~dict[word] = 1~~ } } // sort words by number of occurrences Line 4,170 ⟶ 5,102: // print the top count words print("Rank\tWord\tCount") for (i, (word, n)) in wordCounts[0.~~.<min~~prefix(~~count, wordCounts.~~count)].enumerated() { print("\(i + 1)\t\(word)\t\(n)") } Line 4,179 ⟶ 5,111: } catch { print(error.localizedDescription) }</~~lang~~syntaxhighlight> {{out}} Line 4,194 ⟶ 5,126: 9 that 7922 10 it 6659 </pre> =={{header\|Tcl}}== <syntaxhighlight lang="tcl">lassign $argv head while { [gets stdin line] >= 0 } { foreach word [regexp -all -inline {[A-Za-z]+} $line] { dict incr wordcount [string tolower $word] } } set sorted [lsort -stride 2 -index 1 -int -decr $wordcount] foreach {word count} [lrange $sorted 0 [expr {$head 2 - 1}]] { puts "$count\t$word" }</syntaxhighlight> ./wordcount-di.tcl 10 < 135-0.txt {{out}} <pre> 41093 the 19954 of 14943 and 14558 a 13953 to 11219 in 9649 he 8622 was 7924 that 6661 it </pre> =={{header\|TMG}}== McIlroy's Unix TMG: <~~lang~~syntaxhighlight ~~UnixTMG~~lang="unixtmg">/* Input format: N text / / Only lowercase letters can constitute a word in text. / / (c) 2020, Andrii Makukha, 2-clause BSD licence. / Line 4,259 ⟶ 5,219: / Character classes */ letter: <<abcdefghijklmnopqrstuvwxyz>>; other: !<<abcdefghijklmnopqrstuvwxyz>>;</~~lang~~syntaxhighlight> Unix TMG didn't have <tt>tolower</tt> builtin. Therefore, you would use it together with <tt>tr</tt>: <~~lang~~syntaxhighlight lang="bash">cat file \| tr A-Z a-z > file1; ./a.out file1</~~lang~~syntaxhighlight> Additionally, because 1972 TMG only understood ASCII characters, you might want to strip down the diacritics (e.g., é → e): <~~lang~~syntaxhighlight lang="bash">cat file \| uni2ascii -B \| tr A-Z a-z > file1; ./a.out file1</~~lang~~syntaxhighlight> =={{header\|Transd}}== <syntaxhighlight lang="Scheme">#lang transd MainModule: { _start: (λ locals: cnt 0 (with fs FileStream() words String() (open-r fs "/mnt/text/Literature/Miserables.txt") (textin fs words) (with v ( -\| (split (tolower words)) (group-by) (regroup-by (λ v Vector<String>() -> Int() (size v)))) (for i in v :rev do (lout (get (get (snd i) 0) 0) ":\t " (fst i)) (+= cnt 1) (if (> cnt 10) break)) ))) }</syntaxhighlight> {{out}} <pre> the: 40379 of: 19869 and: 14468 a: 14278 to: 13590 in: 11025 he: 9213 was: 8347 that: 7249 his: 6414 had: 6051 </pre> =={{header\|UNIX Shell}}== Line 4,271 ⟶ 5,264: {{works with\|zsh}} This is derived from Doug McIlroy's original 6-line note in the ACM article cited in the task. <~~lang~~syntaxhighlight lang="bash">#!/bin/sh ~~cat~~ <"${1~~} \|~~" tr -cs A-Za-z '\n' \| tr A-Z a-z \| LC_ALL=C sort \| uniq -c \| sort -rn \| ~~sed~~head -n "${2}q"</~~lang~~syntaxhighlight> Line 4,289 ⟶ 5,282: 6661 it </pre> === Original + URL import === This is Doug McIlroy's original solution but follows other solutions in importing the task's text file from the web and directly specifying the 10 most commonly used words. <syntaxhighlight lang="zsh">curl "https://www.gutenberg.org/files/135/135-0.txt" \| tr -cs A-Za-z '\n' \| tr A-Z a-z \| sort \| uniq -c \| sort -rn \| sed 10q</syntaxhighlight> {{Out}} <pre>41096 the 19955 of 14939 and 14558 a 13954 to 11218 in 9649 he 8622 was 7924 that 6661 it</pre> =={{header\|VBA}}== In order to use it, you have to adapt the PATHFILE Const. <syntaxhighlight lang="vb"> ~~<lang vb>~~ Option Explicit Line 4,409 ⟶ 5,425: If d.Exists(Word) Then _ DisplayFrequencyOf = d(Word) End Function</~~lang~~syntaxhighlight> {{out}} <pre>Words different in this book : 25884 Line 4,441 ⟶ 5,457: I've taken the view that 'letter' means either a letter or digit for Unicode codepoints up to 255. I haven't included underscore, hyphen nor apostrophe as these usually separate compound words. Not very quick (runs in about 4715 seconds on my system) though this is partially due to Wren not having regular expressions and the string pattern matching module being written in Wren itself rather than C. If the Go example is re-run today (2117 ~~October~~February ~~2020~~2024), then the output matches this Wren example precisely though it appears that the text file has changed since the former was written more than 25 years ago. <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "io" for File import "./str" for Str import "./sort" for Sort import "./fmt" for Fmt import "./pattern" for Pattern var fileName = "135-0.txt" Line 4,471 ⟶ 5,487: var freq = keyVals[rank-1].value Fmt.print("$2d $-4s $5d", rank, word, freq) }</~~lang~~syntaxhighlight> {{out}} Line 4,491 ⟶ 5,507: =={{header\|XQuery}}== <~~lang~~syntaxhighlight lang="xquery">let $maxentries := 10, $uri := 'https://www.gutenberg.org/files/135/135-0.txt' return Line 4,510 ⟶ 5,526: return <word key="{$key}" count="{$count}"/> )[position()=(1 to $maxentries)] }</words></~~lang~~syntaxhighlight> {{out}} <~~lang~~syntaxhighlight lang="xml"><words in="https://www.gutenberg.org/files/135/135-0.txt" top="10"> <word key="the" count="41092"/> <word key="of" count="19954"/> Line 4,523 ⟶ 5,539: <word key="that" count="7924"/> <word key="it" count="6661"/> </words></~~lang~~syntaxhighlight> =={{header\|zkl}}== <~~lang~~syntaxhighlight lang="zkl">fname,count := vm.arglist; // grab cammand line args // words may have leading or trailing "_", ie "the" and "_the" Line 4,532 ⟶ 5,548: RegExp("[a-z]+").pump.fp1(Dictionary().incV)) // line-->(word:count,..) .toList().copy().sort(fcn(a,b){ b[1]<a[1] })[0,count.toInt()] // hash-->list .pump(String,Void.Xplode,"%s,%s\n".fmt).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 4,547 ⟶ 5,563: it,6661 </pre> {{omit from\|6502 Assembly\|The text file is much larger than the CPU's address space.}} {{omit from\|Z80 Assembly}} {{omit from\|8080 Assembly}}