N-grams: Difference between revisions

Content added Content deleted
(Add Factor)
Line 72: Line 72:
{ "AN" 1 }
{ "AN" 1 }
}
}
</pre>

=={{header|jq}}==
'''Works with jq and gojq, that is, the C and Go implementations of jq.'''

<syntaxhighlight lang=jq>
# Generic "bag of words" utility:
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);

# The ngrams as a bow
def ngrams($n):
ascii_upcase as $text
| bow( range(0;$text|length - $n) as $i | $text[$i:$i+$n]);

# The task
# Sort by increasing frequency, then by lexicographical order
def ngrams($text; $n):
($text|ngrams($n)) as $ngrams
| "\nAll \($n)-grams of '\($text)' and their frequencies:",
($ngrams|to_entries|sort_by(.value,.key)[] | "\(.key): \(.value)" ) ;

ngrams("Live and let live"; 2,3,4)
</syntaxhighlight>
{{output}}
<pre>
All 2-grams of 'Live and let live' and their frequencies:
A: 1
AN: 1
D : 1
E : 1
ET: 1
LE: 1
ND: 1
T : 1
VE: 1
L: 2
IV: 2
LI: 2

All 3-grams of 'Live and let live' and their frequencies:
AN: 1
LE: 1
LI: 1
AND: 1
D L: 1
E A: 1
ET : 1
IVE: 1
LET: 1
ND : 1
T L: 1
VE : 1
LIV: 2

All 4-grams of 'Live and let live' and their frequencies:
AND: 1
LET: 1
LIV: 1
AND : 1
D LE: 1
E AN: 1
ET L: 1
IVE : 1
LET : 1
LIVE: 1
ND L: 1
T LI: 1
VE A: 1
</pre>
</pre>