N-grams: Difference between revisions

Line 1:

An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters.

The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text.

Line 16:

Line 18:

Note that space and other non-alphanumeric characters are taken into account.

;See also

;* [[Sorensen–Dice_coefficient|Related task: Sorensen–Dice coefficient]]

Line 42:

Line 48:

("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1))

</syntaxhighlight>

=={{header|Raku}}==

<syntaxhighlight lang="raku" line>sub n-gram ($this, $N=2) { Bag.new( flat $this.uc.map: { .comb.rotor($N => -($N-1))».join } ) }

dd 'Live and let live'.&n-gram; # bi-gram

dd 'Live and let live'.&n-gram(3); # tri-gram</syntaxhighlight>

<pre>("IV"=>2,"T "=>1,"VE"=>2,"E "=>1,"LE"=>1,"AN"=>1,"LI"=>2,"ND"=>1,"ET"=>1," L"=>2," A"=>1,"D "=>1).Bag

("ET "=>1,"AND"=>1,"LIV"=>2," LI"=>1,"ND "=>1," LE"=>1,"IVE"=>2,"E A"=>1,"VE "=>1,"T L"=>1,"D L"=>1,"LET"=>1," AN"=>1).Bag</pre>