N-grams: Difference between revisions

m
Add draft markup, related task, Raku example
(Created the Common Lisp entry)
m (Add draft markup, related task, Raku example)
Line 1:
{{draft task}}
 
An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters.
The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text.
Line 16 ⟶ 18:
 
Note that space and other non-alphanumeric characters are taken into account.
 
 
;See also
;* [[Sorensen–Dice_coefficient|Related task: Sorensen–Dice coefficient]]
 
 
Line 42 ⟶ 48:
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1))
</syntaxhighlight>
 
=={{header|Raku}}==
 
<syntaxhighlight lang="raku" line>sub n-gram ($this, $N=2) { Bag.new( flat $this.uc.map: { .comb.rotor($N => -($N-1))».join } ) }
dd 'Live and let live'.&n-gram; # bi-gram
dd 'Live and let live'.&n-gram(3); # tri-gram</syntaxhighlight>
{{out}}
<pre>("IV"=>2,"T "=>1,"VE"=>2,"E "=>1,"LE"=>1,"AN"=>1,"LI"=>2,"ND"=>1,"ET"=>1," L"=>2," A"=>1,"D "=>1).Bag
("ET "=>1,"AND"=>1,"LIV"=>2," LI"=>1,"ND "=>1," LE"=>1,"IVE"=>2,"E A"=>1,"VE "=>1,"T L"=>1,"D L"=>1,"LET"=>1," AN"=>1).Bag</pre>
10,333

edits