N-grams: Difference between revisions

Content added Content deleted
(Created the Common Lisp entry)
m (Add draft markup, related task, Raku example)
Line 1: Line 1:
{{draft task}}

An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters.
An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters.
The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text.
The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text.
Line 16: Line 18:


Note that space and other non-alphanumeric characters are taken into account.
Note that space and other non-alphanumeric characters are taken into account.


;See also
;* [[Sorensen–Dice_coefficient|Related task: Sorensen–Dice coefficient]]




Line 42: Line 48:
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1))
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1))
</syntaxhighlight>
</syntaxhighlight>

=={{header|Raku}}==

<syntaxhighlight lang="raku" line>sub n-gram ($this, $N=2) { Bag.new( flat $this.uc.map: { .comb.rotor($N => -($N-1))».join } ) }
dd 'Live and let live'.&n-gram; # bi-gram
dd 'Live and let live'.&n-gram(3); # tri-gram</syntaxhighlight>
{{out}}
<pre>("IV"=>2,"T "=>1,"VE"=>2,"E "=>1,"LE"=>1,"AN"=>1,"LI"=>2,"ND"=>1,"ET"=>1," L"=>2," A"=>1,"D "=>1).Bag
("ET "=>1,"AND"=>1,"LIV"=>2," LI"=>1,"ND "=>1," LE"=>1,"IVE"=>2,"E A"=>1,"VE "=>1,"T L"=>1,"D L"=>1,"LET"=>1," AN"=>1).Bag</pre>