N-grams: Difference between revisions
Content added Content deleted
(Created task) |
(Created the Common Lisp entry) |
||
Line 16: | Line 16: | ||
Note that space and other non-alphanumeric characters are taken into account. |
Note that space and other non-alphanumeric characters are taken into account. |
||
=={{header|Common Lisp}}== |
|||
A hash table is used to store and retrieve the n-grams fast. |
|||
<syntaxhighlight lang="lisp"> |
|||
(defun n-grams (text n) |
|||
"Return a list of all the N-grams of length n in the text, together with their frequency" |
|||
(let* (res (*ht-n-grams* (make-hash-table :test 'equal)) ) |
|||
(loop for i from 0 to (- (length text) n) do |
|||
(let* ((n-gram (string-upcase (subseq text i (+ i n)))) |
|||
(freq (gethash n-gram *ht-n-grams*))) |
|||
(setf (gethash n-gram *ht-n-grams*) (if (null freq) 1 (1+ freq))) )) |
|||
(maphash #'(lambda (key val) |
|||
(push (cons key val) res) ) |
|||
*ht-n-grams* ) |
|||
(sort res #'> :key #'cdr) )) |
|||
</syntaxhighlight> |
|||
{{out}} |
|||
<syntaxhighlight lang="lisp"> |
|||
> (n-grams "Live and let live" 2) |
|||
(("LI" . 2) ("IV" . 2) ("VE" . 2) (" L" . 2) ("E " . 1) (" A" . 1) ("AN" . 1) |
|||
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1)) |
|||
</syntaxhighlight> |
Revision as of 14:44, 21 April 2023
An N-gram is a sequence of N contiguous elements of a given text. Although N-grams refer sometimes to words or syllables, in this task we will consider only sequences of characters. The task consists in, given a text and an integer size of the desired N-grams, find all the different contiguous sequences of N characters, together with the number of times they appear in the text. For example, the 2-grams of the text "Live and let live" are:
"LI" - 2 "IV" - 2 "VE" - 2 " L" - 2 "E " - 1 " A" - 1 "AN" - 1 "ND" - 1 "D " - 1 "LE" - 1 "ET" - 1 "T " - 1
Note that space and other non-alphanumeric characters are taken into account.
Common Lisp
A hash table is used to store and retrieve the n-grams fast.
(defun n-grams (text n)
"Return a list of all the N-grams of length n in the text, together with their frequency"
(let* (res (*ht-n-grams* (make-hash-table :test 'equal)) )
(loop for i from 0 to (- (length text) n) do
(let* ((n-gram (string-upcase (subseq text i (+ i n))))
(freq (gethash n-gram *ht-n-grams*)))
(setf (gethash n-gram *ht-n-grams*) (if (null freq) 1 (1+ freq))) ))
(maphash #'(lambda (key val)
(push (cons key val) res) )
*ht-n-grams* )
(sort res #'> :key #'cdr) ))
- Output:
> (n-grams "Live and let live" 2)
(("LI" . 2) ("IV" . 2) ("VE" . 2) (" L" . 2) ("E " . 1) (" A" . 1) ("AN" . 1)
("ND" . 1) ("D " . 1) ("LE" . 1) ("ET" . 1) ("T " . 1))