N-grams: Difference between revisions
Content added Content deleted
(Created the Common Lisp entry) |
Thundergnat (talk | contribs) m (Add draft markup, related task, Raku example) |
||
Line 1: | Line 1: | ||
{{draft task}} |
|||
An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters. |
An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters. |
||
The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text. |
The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text. |
||
Line 16: | Line 18: | ||
Note that space and other non-alphanumeric characters are taken into account. |
Note that space and other non-alphanumeric characters are taken into account. |
||
;See also |
|||
;* [[Sorensen–Dice_coefficient|Related task: Sorensen–Dice coefficient]] |
|||
Line 42: | Line 48: | ||
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1)) |
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1)) |
||
</syntaxhighlight> |
</syntaxhighlight> |
||
=={{header|Raku}}== |
|||
<syntaxhighlight lang="raku" line>sub n-gram ($this, $N=2) { Bag.new( flat $this.uc.map: { .comb.rotor($N => -($N-1))».join } ) } |
|||
dd 'Live and let live'.&n-gram; # bi-gram |
|||
dd 'Live and let live'.&n-gram(3); # tri-gram</syntaxhighlight> |
|||
{{out}} |
|||
<pre>("IV"=>2,"T "=>1,"VE"=>2,"E "=>1,"LE"=>1,"AN"=>1,"LI"=>2,"ND"=>1,"ET"=>1," L"=>2," A"=>1,"D "=>1).Bag |
|||
("ET "=>1,"AND"=>1,"LIV"=>2," LI"=>1,"ND "=>1," LE"=>1,"IVE"=>2,"E A"=>1,"VE "=>1,"T L"=>1,"D L"=>1,"LET"=>1," AN"=>1).Bag</pre> |