Jump to content

Word frequency: Difference between revisions

m
syntax highlighting fixup automation
m (syntax highlighting fixup automation)
Line 40:
 
=={{header|11l}}==
<langsyntaxhighlight lang="11l">DefaultDict[String, Int] cnt
L(word) re:‘\w+’.find_strings(File(‘135-0.txt’).read().lowercase())
cnt[word]++
print(sorted(cnt.items(), key' wordc -> wordc[1], reverse' 1B)[0.<10])</langsyntaxhighlight>
 
{{out}}
Line 56:
{{works with|Ada|Ada|2012}}
 
<langsyntaxhighlight Adalang="ada">with Ada.Command_Line;
with Ada.Text_IO;
with Ada.Integer_Text_IO;
Line 143:
end loop;
end Word_Frequency;
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 162:
{{works with|ALGOL 68G|Any - tested with release 2.8.3.win32}}
Uses the associative array implementations in [[ALGOL_68/prelude]].
<langsyntaxhighlight lang="algol68"># find the n most common words in a file #
# use the associative array in the Associate array/iteration task #
# but with integer values #
Line 286:
print( ( whole( top counts[ i ], -6 ), ": ", top words[ i ], newline ) )
OD
FI</langsyntaxhighlight>
{{out}}
<pre>
Line 308:
{{works with|GNU APL}}
 
<syntaxhighlight lang="apl">
<lang APL>
⍝⍝ NOTE: input text is assumed to be encoded in ISO-8859-1
⍝⍝ (The suggested example '135-0.txt' of Les Miserables on
Line 339:
the of and a to
41042 19952 14938 14526 13942
</syntaxhighlight>
</lang>
 
=={{header|AppleScript}}==
 
<langsyntaxhighlight lang="applescript">(*
For simplicity here, words are considered to be uninterrupted sequences of letters and/or digits.
The set text is too messy to warrant faffing around with anything more sophisticated.
Line 424:
set filePath to POSIX path of ((path to desktop as text) & "www.rosettacode.org:Word frequency:135-0.txt")
set n to 10
return wordFrequency(filePath, n)</langsyntaxhighlight>
 
{{output}}
<langsyntaxhighlight lang="applescript">"The 10 most frequently occurring words in the file are:
The: 41092
Of: 19954
Line 437:
Was: 8622
That: 7924
It: 6661"</langsyntaxhighlight>
 
=={{header|Arturo}}==
 
<langsyntaxhighlight lang="rebol">findFrequency: function [file, count][
freqs: #[]
r: {/[[:alpha:]]+/}
Line 458:
loop findFrequency "https://www.gutenberg.org/files/135/135-0.txt" 10 'pair [
print pair
]</langsyntaxhighlight>
 
{{out}}
Line 474:
 
=={{header|AutoHotkey}}==
<langsyntaxhighlight AutoHotkeylang="autohotkey">URLDownloadToFile, http://www.gutenberg.org/files/135/135-0.txt, % A_temp "\tempfile.txt"
FileRead, H, % A_temp "\tempfile.txt"
FileDelete, % A_temp "\tempfile.txt"
Line 490:
}
MsgBox % "Freq`tWord`n" result
return</langsyntaxhighlight>
Outputs:<pre>Freq Word
41036 The
Line 504:
 
=={{header|AWK}}==
<syntaxhighlight lang="awk">
<lang AWK>
# syntax: GAWK -f WORD_FREQUENCY.AWK [-v show=x] LES_MISERABLES.TXT
#
Line 533:
exit(0)
}
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 552:
==={{header|QB64}}===
This is a rather long code. I fulfilled the requirement with QB64. It "cleans" each word so it takes as a word anything that begins and ends with a letter. It works with arrays. Amazing the speed of QB64 to do this job with such a big file as Les Miserables.txt.
<syntaxhighlight lang="qbasic">
<lang QBASIC>
OPTION _EXPLICIT
 
Line 1,120:
 
END SUB
</syntaxhighlight>
</lang>
 
{{output}}
Line 1,164:
==={{header|BaCon}}===
Removing all punctuation, digits, tabs and carriage returns. So "This", "this" and "this." are the same. Full support for UTF8 characters in words. The code itself could be smaller, but for sake of clarity all has been written explicitly.
<langsyntaxhighlight lang="bacon">' We do not count superfluous spaces as words
OPTION COLLAPSE TRUE
 
Line 1,187:
FOR i = 0 TO 9
PRINT term$[i], " : ", frequency(term$[i])
NEXT</langsyntaxhighlight>
{{output}}
<pre>
Line 1,208:
You could cut the length of this down drastically if you didn't need to be able to recall the word at nth position and wished only to display the top 10 words.
 
<langsyntaxhighlight lang="dos">
@echo off
 
Line 1,254:
goto:eof
</syntaxhighlight>
</lang>
 
 
Line 1,287:
 
 
<langsyntaxhighlight lang="bracmat"> ( 10-most-frequent-words
= MergeSort { Local variable declarations. }
types
Line 1,330:
& !most-frequent-words { Return the last 10 terms. }
)
& out$(10-most-frequent-words$"135-0.txt") { Call 10-most-frequent-words with name of inout file and print result to screen. }</langsyntaxhighlight>
'''Output'''
<pre> (6661.it)
Line 1,346:
{{libheader|GLib}}
Words are defined by the regular expression "\w+".
<langsyntaxhighlight lang="c">#include <stdbool.h>
#include <stdio.h>
#include <glib.h>
Line 1,437:
return EXIT_FAILURE;
return EXIT_SUCCESS;
}</langsyntaxhighlight>
 
{{out}}
Line 1,457:
=={{header|C sharp|C#}}==
{{trans|D}}
<langsyntaxhighlight lang="csharp">using System;
using System.Collections.Generic;
using System.IO;
Line 1,489:
}
}
}</langsyntaxhighlight>
{{out}}
<pre>Rank Word Frequency
Line 1,505:
 
=={{header|C++}}==
<langsyntaxhighlight lang="cpp">#include <algorithm>
#include <cstdlib>
#include <fstream>
Line 1,550:
return 0;
}
</syntaxhighlight>
</lang>
 
{{out}}
Line 1,568:
===Alternative===
{{trans|C#}}
<langsyntaxhighlight lang="cpp">#include <algorithm>
#include <iostream>
#include <fstream>
Line 1,624:
 
return 0;
}</langsyntaxhighlight>
{{out}}
<pre>Rank Word Frequency
Line 1,641:
===C++20===
{{trans|C#}}
<langsyntaxhighlight lang="cpp">#include <algorithm>
#include <iostream>
#include <format>
Line 1,683:
std::cout << std::format("{:2} {:>4} {:5}\n", rank++, word, count);
}
}</langsyntaxhighlight>
{{out}}
<pre>Rank Word Frequency
Line 1,699:
 
=={{header|Clojure}}==
<langsyntaxhighlight lang="clojure">(defn count-words [file n]
(->> file
slurp
Line 1,706:
frequencies
(sort-by val >)
(take n)))</langsyntaxhighlight>
 
{{Out}}
Line 1,716:
 
=={{header|COBOL}}==
<syntaxhighlight lang="cobol">
<lang COBOL>
IDENTIFICATION DIVISION.
PROGRAM-ID. WordFrequency.
Line 1,930:
CLOSE Word-File Output-File.
END-PROGRAM.
</syntaxhighlight>
</lang>
 
{{Out}}
Line 1,953:
 
=={{header|Common Lisp}}==
<langsyntaxhighlight lang="lisp">
(defun count-word (n pathname)
(with-open-file (s pathname :direction :input)
Line 1,974:
(dolist (word words) (incf (gethash word hash 0)))
(maphash #'(lambda (e n) (push `(,e . ,n) ac)) hash) ac)
</syntaxhighlight>
</lang>
 
{{Out}}
Line 1,984:
 
=={{header|Crystal}}==
<langsyntaxhighlight lang="ruby">require "http/client"
require "regex"
 
Line 2,002:
.sort { |a, b| b[1] <=> a[1] }[0..9] # sort and get the first 10 elements
.each_with_index(1) { |(word, n), i| puts "#{i} \t #{word} \t #{n}" } # print the result
</syntaxhighlight>
</lang>
 
{{out}}
Line 2,019:
 
=={{header|D}}==
<langsyntaxhighlight Dlang="d">import std.algorithm : sort;
import std.array : appender, split;
import std.range : take;
Line 2,054:
writefln("%4s %-10s %9s", rank++, word.k, word.v);
}
}</langsyntaxhighlight>
 
{{out}}
Line 2,075:
{{libheader| System.RegularExpressions}}
{{Trans|C#}}
<syntaxhighlight lang="delphi">
<lang Delphi>
program Word_frequency;
 
Line 2,148:
readln;
end.
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,165:
</pre>
=={{header|F Sharp}}==
<langsyntaxhighlight lang="fsharp">
open System.IO
open System.Text.RegularExpressions
let g=Regex("[A-Za-zÀ-ÿ]+").Matches(File.ReadAllText "135-0.txt")
[for n in g do yield n.Value.ToLower()]|>List.countBy(id)|>List.sortBy(fun n->(-(snd n)))|>List.take 10|>List.iter(fun n->printfn "%A" n)
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,187:
=={{header|Factor}}==
This program expects stdin to read from a file via the command line. ( e.g. invoking the program in Windows: <tt>>factor word-count.factor < input.txt</tt> ) The definition of a word here is simply any string surrounded by some combination of spaces, punctuation, or newlines.
<langsyntaxhighlight lang="factor">
USING: ascii io math.statistics prettyprint sequences
splitting ;
Line 2,194:
lines " " join " .,?!:;()\"-" split harvest [ >lower ] map
sorted-histogram <reversed> 10 head .
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,212:
 
=={{header|FreeBASIC}}==
<langsyntaxhighlight lang="freebasic">
#Include "file.bi"
type tally
Line 2,342:
print "time for operation ";timer-tm;" seconds"
sleep
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 2,379:
There are two sample programs below. First, a simple but powerful method that works in old versions of Frink:
<langsyntaxhighlight lang="frink">d = new dict
for w = select[wordList[read[normalizeUnicode["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]]], %r/[[:alnum:]]/ ]
d.increment[lc[w], 1]
 
println[join["\n", first[reverse[sort[array[d], {|a,b| a@1 <=> b@1}]], 10]]]</langsyntaxhighlight>
 
{{out}}
Line 2,401:
Next, a "showing off" one-liner that works in recent versions of Frink that uses the <CODE>countToArray</CODE> function which easily creates sorted frequency lists and the <CODE>formatTable</CODE> function that formats into a nice table with columns lined up, and still performs full Unicode-aware normalization, capitalization, and word-breaking:
 
<langsyntaxhighlight lang="frink">formatTable[first[countToArray[select[wordList[lc[normalizeUnicode[read["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]]]], %r/[[:alnum:]]/ ]], 10], "right"]</langsyntaxhighlight>
 
{{out}}
Line 2,419:
=={{header|FutureBasic}}==
Task said: "Feel free to explicitly state the thoughts behind the program decisions." Thus the heavy comments.
<langsyntaxhighlight lang="futurebasic">
include "NSLog.incl"
 
Line 2,516:
 
HandleEvents
</syntaxhighlight>
</lang>
{{output}}
<pre>
Line 2,552:
=={{header|Go}}==
{{trans|Kotlin}}
<langsyntaxhighlight lang="go">package main
 
import (
Line 2,594:
fmt.Printf("%2d %-4s %5d\n", rank, word, freq)
}
}</langsyntaxhighlight>
 
{{out}}
Line 2,614:
=={{header|Groovy}}==
Solution:
<langsyntaxhighlight lang="groovy">def topWordCounts = { String content, int n ->
def mapCounts = [:]
content.toLowerCase().split(/\W+/).each {
Line 2,622:
println "Rank Word Frequency\n==== ==== ========="
(0..<n).each { printf ("%4d %-4s %9d\n", it+1, top[it].key, top[it].value) }
}</langsyntaxhighlight>
 
Test:
<langsyntaxhighlight lang="groovy">def rawText = "http://www.gutenberg.org/files/135/135-0.txt".toURL().text
topWordCounts(rawText, 10)</langsyntaxhighlight>
 
Output:
Line 2,645:
===Lazy IO with pure Map, arrows===
{{trans|Clojure}}
<langsyntaxhighlight Haskelllang="haskell">module Main where
 
import Control.Category -- (>>>)
Line 2,685:
>>> take n
>>> print)
when filep (hClose hand)</langsyntaxhighlight>
{{Out}}
<pre>
Line 2,694:
===Lazy IO, map of IORefs===
Using IORefs as values in the map seems to give a ~2x speedup on large files. The below code is based on https://github.com/composewell/streamly-examples/blob/master/examples/WordFrequency.hs , but still using lazy IO to avoid the extra library dependency (in production you should [https://stackoverflow.com/questions/5892653/whats-so-bad-about-lazy-i-o use a streaming library] like streamly/conduit/io-streams):
<langsyntaxhighlight lang="haskell">
module Main where
 
Line 2,733:
in mapM readRef $ M.toList freqtable
print $ take maxw $ sortOn (Down . snd) counts
</syntaxhighlight>
</lang>
{{Out}}
<pre>
Line 2,742:
===Lazy IO, short code, but not streaming===
Or, perhaps a little more simply, though not streaming (will read everything into memory, don't use on big files):
<langsyntaxhighlight lang="haskell">import qualified Data.Text.IO as T
import qualified Data.Text as T
 
Line 2,754:
 
main :: IO ()
main = T.readFile "miserables.txt" >>= (mapM_ print . take 10 . frequentWords)</langsyntaxhighlight>
{{Out}}
<pre>(40370,"the")
Line 2,813:
=={{header|Java}}==
{{trans|Kotlin}}
<langsyntaxhighlight Javalang="java">import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
Line 2,855:
}
}
}</langsyntaxhighlight>
{{out}}
<pre>Rank Word Frequency
Line 2,877:
may not begin with hyphen. Thus "the-the" would count as one word, and "-the" would be excluded.
 
<syntaxhighlight lang="jq">
<lang jq>
< 135-0.txt jq -nR --argjson n 10 '
def bow(stream):
Line 2,889:
| from_entries
'
</syntaxhighlight>
</lang>
====Output====
<syntaxhighlight lang="jq">
<lang jq>
{
"the": 41087,
Line 2,904:
"it": 6661
}
</syntaxhighlight>
</lang>
 
=={{header|Julia}}==
{{works with|Julia|1.0}}
<langsyntaxhighlight lang="julia">
using FreqTables
 
Line 2,914:
words = split(replace(txt, r"\P{L}"i => " "))
table = sort(freqtable(words); rev=true)
println(table[1:10])</langsyntaxhighlight>
 
{{out}}
Line 2,933:
The below program defines the function 'stats' which accepts a filename containing the text.
 
<langsyntaxhighlight lang="kap">∇ stats (file) {
content ← "[\\h,.\"'\n-]+" regex:split unicode:toLower io:readFile file
sorted ← (⍋⊇⊢) content
Line 2,939:
words ← selection / sorted
{⍵[10↑⍒⍵[;1];]} words ,[0.5] ≢¨ sorted ⊂⍨ +\selection
}</langsyntaxhighlight>
{{out}}
<pre>┏━━━━━━━━━━━━┓
Line 2,960:
 
There is no change in the results if the numerals 0-9 are also regarded as letters.
<langsyntaxhighlight lang="scala">// version 1.1.3
 
import java.io.File
Line 2,978:
for ((word, freq) in wordGroups)
System.out.printf("%2d %-4s %5d\n", rank++, word, freq)
}</langsyntaxhighlight>
 
{{out}}
Line 2,997:
 
=={{header|Liberty BASIC}}==
<langsyntaxhighlight lang="lb">dim words$(100000,2)'words$(a,1)=the word, words$(a,2)=the count
dim lines$(150000)
open "135-0.txt" for input as #txt
Line 3,063:
close #txt
end
</syntaxhighlight>
</lang>
{{out}}
<pre>Count Word
Line 3,084:
=={{header|Lua}}==
{{works with|lua|5.3}}
<langsyntaxhighlight lang="lua">
-- This program takes two optional command line arguments. The first (arg[1])
-- specifies the input file, or defaults to standard input. The second
Line 3,113:
io.write(string.format('%7d %s\n', array[i][1] , array[i][2]))
end
</syntaxhighlight>
</lang>
 
{{Out}}
Line 3,136:
 
=={{header|Mathematica}} / {{header|Wolfram Language}}==
<langsyntaxhighlight Mathematicalang="mathematica">TakeLargest[10]@WordCounts[Import["https://www.gutenberg.org/files/135/135-0.txt"], IgnoreCase->True]//Dataset</langsyntaxhighlight>
{{out}}
<pre>
Line 3,152:
 
=={{header|MATLAB}} / {{header|Octave}}==
<syntaxhighlight lang="matlab">
<lang Matlab>
function [result,count] = word_frequency()
URL='https://www.gutenberg.org/files/135/135-0.txt';
Line 3,167:
fprintf(1,'%d\t%s\n',count(k),result{k})
end
</syntaxhighlight>
</lang>
 
{{out}}
Line 3,184:
 
=={{header|Nim}}==
<langsyntaxhighlight Nimlang="nim">import tables, strutils, sequtils, httpclient
 
proc take[T](s: openArray[T], n: int): seq[T] = s[0 ..< min(n, s.len)]
Line 3,194:
wordFrequencies.sort
for (word, count) in toSeq(wordFrequencies.pairs).take(10):
echo alignLeft($count, 8), word</langsyntaxhighlight>
{{out}}
<pre>40377 the
Line 3,208:
 
=={{header|Objeck}}==
<langsyntaxhighlight lang="objeck">use System.IO.File;
use Collection;
use RegEx;
Line 3,260:
};
}
}</langsyntaxhighlight>
 
Output:
Line 3,280:
=={{header|OCaml}}==
 
<langsyntaxhighlight lang="ocaml">let () =
let n =
try int_of_string Sys.argv.(1)
Line 3,306:
List.iter (fun (word, count) ->
Printf.printf "%d %s\n" count word
) r</langsyntaxhighlight>
 
{{out}}
Line 3,325:
=={{header|Perl}}==
{{trans|Raku}}
<langsyntaxhighlight lang="perl">$top = 10;
 
open $fh, "<", '135-0.txt';
Line 3,347:
last if ++$c >= $top;
}
}</langsyntaxhighlight>
 
{{out}}
Line 3,387:
 
=={{header|Phix}}==
<!--<langsyntaxhighlight Phixlang="phix">(notonline)-->
<span style="color: #008080;">without</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #0000FF;">?</span><span style="color: #008000;">"loading..."</span>
Line 3,414:
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #7060A8;">traverse_dict</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">routine_id</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"visitor"</span><span style="color: #0000FF;">),</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">wf</span><span style="color: #0000FF;">,</span><span style="color: #004600;">true</span><span style="color: #0000FF;">)</span>
<!--</langsyntaxhighlight>-->
{{out}}
<pre>
Line 3,431:
 
=={{header|Phixmonti}}==
<langsyntaxhighlight Phixmontilang="phixmonti">include ..\Utilitys.pmt
 
"loading..." ?
Line 3,466:
-1 * get ?
endfor
drop</langsyntaxhighlight>
{{out}}
<pre>loading...
Line 3,485:
 
=={{header|PHP}}==
<langsyntaxhighlight lang="php">
<?php
 
Line 3,500:
}
$i++;
}</langsyntaxhighlight>
{{out}}
<pre>
Line 3,519:
=={{header|Picat}}==
To get the book proper, the header and footer are removed. Here are some tests with different sets of characters to split the words (<code>split_char/1</code>).
<langsyntaxhighlight Picatlang="picat">main =>
NTop = 10,
File = "les_miserables.txt",
Line 3,549:
split_chars(all,"\n\r \t,;!.?()[]”\"-“—-__‘’*").
split_chars(space_punct,"\n\r \t,;!.?").
split_chars(space,"\n\r \t").</langsyntaxhighlight>
 
{{out}}
Line 3,573:
 
=={{header|PicoLisp}}==
<langsyntaxhighlight PicoLisplang="picolisp">(setq *Delim " ^I^J^M-_.,\"'*[]?!&@#$%^\(\):;")
(setq *Skip (chop *Delim))
 
Line 3,587:
(if (idx 'B W T) (inc (car @)) (set W 1)) ) ) )
(for L (head 10 (flip (by val sort (idx 'B))))
(println L (val L)) )</langsyntaxhighlight>
{{out}}
<pre>
Line 3,604:
=={{header|Prolog}}==
{{works with|SWI Prolog}}
<langsyntaxhighlight lang="prolog">print_top_words(File, N):-
read_file_to_string(File, String, [encoding(utf8)]),
re_split("\\w+", String, Words),
Line 3,636:
 
main:-
print_top_words("135-0.txt", 10).</langsyntaxhighlight>
 
{{out}}
Line 3,655:
 
=={{header|PureBasic}}==
<langsyntaxhighlight PureBasiclang="purebasic">EnableExplicit
 
Structure wordcount
Line 3,709:
EndIf
 
End</langsyntaxhighlight>
{{out}}
<pre>
Line 3,730:
===Collections===
====Python2.7====
<langsyntaxhighlight lang="python">import collections
import re
import string
Line 3,740:
 
if __name__ == "__main__":
main()</langsyntaxhighlight>
 
{{Out}}
Line 3,750:
 
====Python3.6====
<langsyntaxhighlight lang="python">from collections import Counter
from re import findall
 
Line 3,769:
if __name__ == "__main__":
n = int(input('How many?: '))
most_common_words_in_file(les_mis_file, n)</langsyntaxhighlight>
 
{{Out}}
Line 3,787:
===Sorted and groupby===
{{Works with|Python|3.7}}
<langsyntaxhighlight lang="python">"""
Word count task from Rosetta Code
http://www.rosettacode.org/wiki/Word_count#Python
Line 3,834:
if __name__ == '__main__':
main()
</syntaxhighlight>
</lang>
{{Out}}
<pre>('the', 40372)
Line 3,848:
 
===Collections, Sorted and Lambda===
<langsyntaxhighlight lang="python">
#!/usr/bin/python3
import collections
Line 3,868:
if i == count - 1:
break
</syntaxhighlight>
</lang>
{{Out}}
<pre>[ 1] the : 41039
Line 3,884:
===Version 1===
I chose to remove apostrophes only if they're followed by an s (so "mom" and "mom's" will show up as the same word but "they" and "they're" won't). I also chose not to remove hyphens.
<syntaxhighlight lang="r">
<lang R>
wordcount<-function(file,n){
punctuation=c("`","~","!","@","#","$","%","^","&","*","(",")","_","+","=","{","[","}","]","|","\\",":",";","\"","<",",",">",".","?","/","'s")
Line 3,900:
return(df[1:n,])
}
</syntaxhighlight>
</lang>
{{Out}}
<pre>
Line 3,920:
===Version 2===
This version is purely functional using the native pipe operator in R 4.1+ and runs in less than a second.
<syntaxhighlight lang="r">
<lang R>
word_frequency_pipeline <- function(file=NULL, n=10) {
Line 3,934:
}
</syntaxhighlight>
</lang>
{{Out}}
<pre>
Line 3,952:
 
=={{header|Racket}}==
<langsyntaxhighlight lang="racket">#lang racket
 
(define (all-words f (case-fold string-downcase))
Line 3,962:
 
(module+ main
(take (counts (all-words "data/les-mis.txt")) 10))</langsyntaxhighlight>
 
{{out}}
Line 3,991:
 
Here is a sample that shows the result when using various different matchers.
<syntaxhighlight lang="raku" perl6line>sub MAIN ($filename, $top = 10) {
my $file = $filename.IO.slurp.lc.subst(/ (<[\w]-[_]>'-')\n(<[\w]-[_]>) /, {$0 ~ $1}, :g );
my @matcher = (
Line 4,003:
.put for $file.comb( $reg ).Bag.sort(-*.value)[^$top];
}
}</langsyntaxhighlight>
 
{{out}}
Line 4,183:
Since REXX doesn't support UTF-8 encodings, code was added to this REXX version to
support the accented letters in the mandated input file.
<langsyntaxhighlight lang="rexx">/*REXX pgm displays top 10 words in a file (includes foreign letters), case is ignored.*/
parse arg fID top . /*obtain optional arguments from the CL*/
if fID=='' | fID=="," then fID= 'les_mes.txt' /*None specified? Then use the default.*/
Line 4,238:
end /*#*/
say commas(totW) ' words found ('commas(c) "unique) in " commas(#),
' records read from file: ' fID; say; return</langsyntaxhighlight>
{{out|output|text=&nbsp; when using the default inputs:}}
<pre>
Line 4,261:
Inspired by version 1 and adapted for ooRexx.
It ignores all characters other than a-z and A-Z (which are translated to a-z).
<syntaxhighlight lang="text">/*REXX program reads and displays a count of words a file. Word case is ignored.*/
Call time 'R'
abc='abcdefghijklmnopqrstuvwxyz'
Line 4,311:
tops=tops+words(tl) /*correctly handle the tied rankings. */
end
Say time('E') 'seconds elapsed'</langsyntaxhighlight>
{{out}}
<pre>We found 22820 different words
Line 4,329:
 
=={{header|Ring}}==
<langsyntaxhighlight lang="ring">
# project : Word count
 
Line 4,388:
b = temp
return [a, b]
</syntaxhighlight>
</lang>
Output:
<pre>
Line 4,404:
 
=={{header|Ruby}}==
<langsyntaxhighlight lang="ruby">
class String
def wc
Line 4,414:
 
open('135-0.txt') { |n| n.read.wc[-10,10].each{|n| puts n[0].to_s+"->"+n[1].to_s} }
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 4,430:
===Tally and max_by===
{{Works with|Ruby|2.7}}
<langsyntaxhighlight lang="ruby">RE = /[[:alpha:]]+/
count = open("135-0.txt").read.downcase.scan(RE).tally.max_by(10, &:last)
count.each{|ar| puts ar.join("->") }
</syntaxhighlight>
</lang>
{{out}}
<pre>the->41092
Line 4,447:
</pre>
===Chain of Enumerables===
<langsyntaxhighlight lang="ruby">wf = File.read("135-0.txt", :encoding => "UTF-8")
.downcase
.scan(/\w+/)
Line 4,460:
w[1]
}
</syntaxhighlight>
</lang>
{{out}}
<pre>[ 1] the : 41040
Line 4,475:
 
=={{header|Rust}}==
<langsyntaxhighlight Rustlang="rust">use std::cmp::Reverse;
use std::collections::HashMap;
use std::fs::File;
Line 4,506:
fn main() {
word_count(File::open("135-0.txt").expect("File open error"), 10)
}</langsyntaxhighlight>
 
{{out}}
Line 4,526:
{{Out}}
Best seen running in your browser [https://scastie.scala-lang.org/EP2Fm6HXQrC1DwtSNvnUzQ Scastie (remote JVM)].
<langsyntaxhighlight Scalalang="scala">import scala.io.Source
 
object WordCount extends App {
Line 4,549:
println(s"\nSuccessfully completed without errors. [total ${scala.compat.Platform.currentTime - executionStart} ms]")
 
}</langsyntaxhighlight>
{{out}}
<pre>Rank Word Frequency
Line 4,573:
to get words from a fle. The words are [http://seed7.sourceforge.net/libraries/string.htm#lower(in_string) converted to lower case], to assure that "The" and "the" are considered the same.
 
<langsyntaxhighlight lang="seed7">$ include "seed7_05.s7i";
include "gethttp.s7i";
include "strifile.s7i";
Line 4,614:
end for;
end for;
end func;</langsyntaxhighlight>
 
{{out}}
Line 4,632:
 
=={{header|Sidef}}==
<langsyntaxhighlight lang="ruby">var count = Hash()
var file = File(ARGV[0] \\ '135-0.txt')
 
Line 4,645:
top.each { |pair|
say "#{pair.key}\t-> #{pair.value}"
}</langsyntaxhighlight>
{{out}}
<pre>
Line 4,661:
 
=={{header|Simula}}==
<langsyntaxhighlight lang="simula">COMMENT COMPILE WITH
$ cim -m64 word-count.sim
;
Line 4,940:
 
END
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 4,958:
 
=={{header|Swift}}==
<langsyntaxhighlight lang="swift">import Foundation
 
func printTopWords(path: String, count: Int) throws {
Line 4,985:
} catch {
print(error.localizedDescription)
}</langsyntaxhighlight>
 
{{out}}
Line 5,003:
 
=={{header|Tcl}}==
<langsyntaxhighlight Tcllang="tcl">lassign $argv head
while { [gets stdin line] >= 0 } {
foreach word [regexp -all -inline {[A-Za-z]+} $line] {
Line 5,013:
foreach {word count} [lrange $sorted 0 [expr {$head * 2 - 1}]] {
puts "$count\t$word"
}</langsyntaxhighlight>
 
./wordcount-di.tcl 10 < 135-0.txt
Line 5,032:
=={{header|TMG}}==
McIlroy's Unix TMG:
<langsyntaxhighlight UnixTMGlang="unixtmg">/* Input format: N text */
/* Only lowercase letters can constitute a word in text. */
/* (c) 2020, Andrii Makukha, 2-clause BSD licence. */
Line 5,093:
/* Character classes */
letter: <<abcdefghijklmnopqrstuvwxyz>>;
other: !<<abcdefghijklmnopqrstuvwxyz>>;</langsyntaxhighlight>
 
Unix TMG didn't have <tt>tolower</tt> builtin. Therefore, you would use it together with <tt>tr</tt>:
<langsyntaxhighlight lang="bash">cat file | tr A-Z a-z > file1; ./a.out file1</langsyntaxhighlight>
 
Additionally, because 1972 TMG only understood ASCII characters, you might want to strip down the diacritics (e.g., é → e):
<langsyntaxhighlight lang="bash">cat file | uni2ascii -B | tr A-Z a-z > file1; ./a.out file1</langsyntaxhighlight>
 
=={{header|UNIX Shell}}==
Line 5,105:
{{works with|zsh}}
This is derived from Doug McIlroy's original 6-line note in the ACM article cited in the task.
<langsyntaxhighlight lang="bash">#!/bin/sh
<"$1" tr -cs A-Za-z '\n' | tr A-Z a-z | LC_ALL=C sort | uniq -c | sort -rn | head -n "$2"</langsyntaxhighlight>
 
 
Line 5,132:
This is Doug McIlroy's original solution but follows other solutions in importing the task's text file from the web and directly specifying the 10 most commonly used words.
 
<langsyntaxhighlight lang="zsh">curl "https://www.gutenberg.org/files/135/135-0.txt" | tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed 10q</langsyntaxhighlight>
 
{{Out}}
Line 5,150:
In order to use it, you have to adapt the PATHFILE Const.
 
<syntaxhighlight lang="vb">
<lang vb>
Option Explicit
 
Line 5,266:
If d.Exists(Word) Then _
DisplayFrequencyOf = d(Word)
End Function</langsyntaxhighlight>
{{out}}
<pre>Words different in this book : 25884
Line 5,301:
 
If the Go example is re-run today (21 October 2020), then the output matches this Wren example precisely though it appears that the text file has changed since the former was written more than 2 years ago.
<langsyntaxhighlight lang="ecmascript">import "io" for File
import "/str" for Str
import "/sort" for Sort
Line 5,328:
var freq = keyVals[rank-1].value
Fmt.print("$2d $-4s $5d", rank, word, freq)
}</langsyntaxhighlight>
 
{{out}}
Line 5,348:
=={{header|XQuery}}==
 
<langsyntaxhighlight lang="xquery">let $maxentries := 10,
$uri := 'https://www.gutenberg.org/files/135/135-0.txt'
return
Line 5,367:
return <word key="{$key}" count="{$count}"/>
)[position()=(1 to $maxentries)]
}</words></langsyntaxhighlight>
{{out}}
<langsyntaxhighlight lang="xml"><words in="https://www.gutenberg.org/files/135/135-0.txt" top="10">
<word key="the" count="41092"/>
<word key="of" count="19954"/>
Line 5,380:
<word key="that" count="7924"/>
<word key="it" count="6661"/>
</words></langsyntaxhighlight>
 
=={{header|zkl}}==
<langsyntaxhighlight lang="zkl">fname,count := vm.arglist; // grab cammand line args
 
// words may have leading or trailing "_", ie "the" and "_the"
Line 5,389:
RegExp("[a-z]+").pump.fp1(Dictionary().incV)) // line-->(word:count,..)
.toList().copy().sort(fcn(a,b){ b[1]<a[1] })[0,count.toInt()] // hash-->list
.pump(String,Void.Xplode,"%s,%s\n".fmt).println();</langsyntaxhighlight>
{{out}}
<pre>
10,333

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.