Talk:N-grams

From Rosetta Code
Revision as of 02:35, 30 March 2024 by Hobson (talk | contribs) (Is it possible to add a secondary implementation?)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
      1. Python example

I like the use of `deque` and `islice`, but not sure they are needed. A more Pythonic (readable, explicit, simpler) approach would be to use the generator pattern.

```python from collections import Counter

def n_grams(text, n=2, topk=10):

   """ Count occurences character n-grams, returning list of 2-tuples (ngram, count) for topk ngrams
   >>> text = "Live and let live"
   >>> n_grams(text.upper(), 1)
   [('L', 3), ('E', 3), (' ', 3), ('I', 2), ('V', 2), ('A', 1),
    ('N', 1), ('D', 1), ('T', 1)]
   >>> n_grams(text.upper(), 2)
   [('LI', 2), ('IV', 2), ('VE', 2), (' L', 2), ('E ', 1),
    (' A', 1), ('AN', 1), ('ND', 1), ('D ', 1), ('LE', 1)]
   >>> n_grams(text.upper(), 3)
   [('LIV', 2), ('IVE', 2), ('VE ', 1), ('E A', 1),
    (' AN', 1), ('AND', 1), ('ND ', 1), ('D L', 1), (' LE', 1), ('LET', 1)]
   >>> n_grams(text.upper(), 4)
   [('LIVE', 2), ('IVE ', 1), ('VE A', 1), ('E AN', 1), (' AND', 1),
    ('AND ', 1), ('ND L', 1), ('D LE', 1), (' LET', 1), ('LET ', 1)]
   """
   return Counter(
       text[i:(i + n)] for i in range(len(text) - n + 1)
   ).most_common(topk)

```