N-grams: Difference between revisions

Content added Content deleted
(Add Python)
Line 202: Line 202:
LIVE: 2
LIVE: 2
</pre>
</pre>
</pre>

=={{header|Phix}}==
A dictionary is used to find the index of already-seen n-grams.<br>
I have replicated most orderings found on this page, the task description order corresponds to orig/freq,<br>
jq is alpha/freq but high last, but there is no equivalent for the Factor or Raku orderings here ;-).
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">n_grams</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">len</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">string</span> <span style="color: #000000;">txt</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">orders</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">ng</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{},</span> <span style="color: #000000;">ngc</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">d</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">new_dict</span><span style="color: #0000FF;">()</span>
<span style="color: #000000;">txt</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">upper</span><span style="color: #0000FF;">(</span><span style="color: #000000;">txt</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">txt</span><span style="color: #0000FF;">)-</span><span style="color: #000000;">len</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">tn</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">txt</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">..</span><span style="color: #000000;">i</span><span style="color: #0000FF;">+</span><span style="color: #000000;">len</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">ndx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">getdd</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">ndx</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">ng</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">,</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">ngc</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ngc</span><span style="color: #0000FF;">,</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">ndx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">setd</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ndx</span><span style="color: #0000FF;">,</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">else</span>
<span style="color: #000000;">ngc</span><span style="color: #0000FF;">[</span><span style="color: #000000;">ndx</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">destroy_dict</span><span style="color: #0000FF;">(</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">l</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">ares</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">columnize</span><span style="color: #0000FF;">({</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ngc</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">l</span><span style="color: #0000FF;">)}),</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">c</span> <span style="color: #008080;">in</span> <span style="color: #000000;">orders</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"original"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- original/first found order</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"orig/freq"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- "" but higher freq first</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort_columns</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">,{-</span><span style="color: #000000;">2</span><span style="color: #0000FF;">,</span><span style="color: #000000;">3</span><span style="color: #0000FF;">}))</span>
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"alphabetic"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- alphabetical order</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">deep_copy</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">)))</span>
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"alpha/freq"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- "" but higher freq first</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort_columns</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">,{-</span><span style="color: #000000;">2</span><span style="color: #0000FF;">,</span><span style="color: #000000;">1</span><span style="color: #0000FF;">}))</span>
<span style="color: #008080;">else</span>
<span style="color: #0000FF;">?</span><span style="color: #000000;">9</span><span style="color: #0000FF;">/</span><span style="color: #000000;">0</span> <span style="color: #000080;font-style:italic;">-- (unknown ordering requested)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">src</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"Live and let live"</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">orders</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"original"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"orig/freq"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"alphabetic"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"alpha/freq"</span><span style="color: #0000FF;">}</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"For \"%s\":\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">src</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">l</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #000000;">4</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">n_grams</span><span style="color: #0000FF;">(</span><span style="color: #000000;">l</span><span style="color: #0000FF;">,</span><span style="color: #000000;">src</span><span style="color: #0000FF;">,</span><span style="color: #000000;">orders</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">count</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">ordinal</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]),</span><span style="color: #004600;">true</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"There are %s unique %d-grams:\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">count</span><span style="color: #0000FF;">,</span><span style="color: #000000;">l</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">,</span><span style="color: #000000;">r</span> <span style="color: #008080;">in</span> <span style="color: #000000;">res</span> <span style="color: #008080;">do</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%12s: %s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">orders</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #7060A8;">join</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">,</span><span style="color: #008000;">", "</span><span style="color: #0000FF;">,</span><span style="color: #000000;">fmt</span><span style="color: #0000FF;">:=</span><span style="color: #008000;">"%s %d"</span><span style="color: #0000FF;">)})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
For "Live and let live":
There are twelve unique 2-grams:
original: LI 2, IV 2, VE 2, E 1, A 1, AN 1, ND 1, D 1, L 2, LE 1, ET 1, T 1
orig/freq: LI 2, IV 2, VE 2, L 2, E 1, A 1, AN 1, ND 1, D 1, LE 1, ET 1, T 1
alphabetic: A 1, L 2, AN 1, D 1, E 1, ET 1, IV 2, LE 1, LI 2, ND 1, T 1, VE 2
alpha/freq: L 2, IV 2, LI 2, VE 2, A 1, AN 1, D 1, E 1, ET 1, LE 1, ND 1, T 1
There are thirteen unique 3-grams:
original: LIV 2, IVE 2, VE 1, E A 1, AN 1, AND 1, ND 1, D L 1, LE 1, LET 1, ET 1, T L 1, LI 1
orig/freq: LIV 2, IVE 2, VE 1, E A 1, AN 1, AND 1, ND 1, D L 1, LE 1, LET 1, ET 1, T L 1, LI 1
alphabetic: AN 1, LE 1, LI 1, AND 1, D L 1, E A 1, ET 1, IVE 2, LET 1, LIV 2, ND 1, T L 1, VE 1
alpha/freq: IVE 2, LIV 2, AN 1, LE 1, LI 1, AND 1, D L 1, E A 1, ET 1, LET 1, ND 1, T L 1, VE 1
There are thirteen unique 4-grams:
original: LIVE 2, IVE 1, VE A 1, E AN 1, AND 1, AND 1, ND L 1, D LE 1, LET 1, LET 1, ET L 1, T LI 1, LIV 1
orig/freq: LIVE 2, IVE 1, VE A 1, E AN 1, AND 1, AND 1, ND L 1, D LE 1, LET 1, LET 1, ET L 1, T LI 1, LIV 1
alphabetic: AND 1, LET 1, LIV 1, AND 1, D LE 1, E AN 1, ET L 1, IVE 1, LET 1, LIVE 2, ND L 1, T LI 1, VE A 1
alpha/freq: LIVE 2, AND 1, LET 1, LIV 1, AND 1, D LE 1, E AN 1, ET L 1, IVE 1, LET 1, ND L 1, T LI 1, VE A 1
</pre>
</pre>