N-grams: Difference between revisions
Content added Content deleted
(Add Factor) |
|||
Line 72: | Line 72: | ||
{ "AN" 1 } |
{ "AN" 1 } |
||
} |
} |
||
</pre> |
|||
=={{header|jq}}== |
|||
'''Works with jq and gojq, that is, the C and Go implementations of jq.''' |
|||
<syntaxhighlight lang=jq> |
|||
# Generic "bag of words" utility: |
|||
def bow(stream): |
|||
reduce stream as $word ({}; .[($word|tostring)] += 1); |
|||
# The ngrams as a bow |
|||
def ngrams($n): |
|||
ascii_upcase as $text |
|||
| bow( range(0;$text|length - $n) as $i | $text[$i:$i+$n]); |
|||
# The task |
|||
# Sort by increasing frequency, then by lexicographical order |
|||
def ngrams($text; $n): |
|||
($text|ngrams($n)) as $ngrams |
|||
| "\nAll \($n)-grams of '\($text)' and their frequencies:", |
|||
($ngrams|to_entries|sort_by(.value,.key)[] | "\(.key): \(.value)" ) ; |
|||
ngrams("Live and let live"; 2,3,4) |
|||
</syntaxhighlight> |
|||
{{output}} |
|||
<pre> |
|||
All 2-grams of 'Live and let live' and their frequencies: |
|||
A: 1 |
|||
AN: 1 |
|||
D : 1 |
|||
E : 1 |
|||
ET: 1 |
|||
LE: 1 |
|||
ND: 1 |
|||
T : 1 |
|||
VE: 1 |
|||
L: 2 |
|||
IV: 2 |
|||
LI: 2 |
|||
All 3-grams of 'Live and let live' and their frequencies: |
|||
AN: 1 |
|||
LE: 1 |
|||
LI: 1 |
|||
AND: 1 |
|||
D L: 1 |
|||
E A: 1 |
|||
ET : 1 |
|||
IVE: 1 |
|||
LET: 1 |
|||
ND : 1 |
|||
T L: 1 |
|||
VE : 1 |
|||
LIV: 2 |
|||
All 4-grams of 'Live and let live' and their frequencies: |
|||
AND: 1 |
|||
LET: 1 |
|||
LIV: 1 |
|||
AND : 1 |
|||
D LE: 1 |
|||
E AN: 1 |
|||
ET L: 1 |
|||
IVE : 1 |
|||
LET : 1 |
|||
LIVE: 1 |
|||
ND L: 1 |
|||
T LI: 1 |
|||
VE A: 1 |
|||
</pre> |
</pre> |
||