N-grams: Difference between revisions
Content added Content deleted
(Add Python) |
|||
Line 202: | Line 202: | ||
LIVE: 2 |
LIVE: 2 |
||
</pre> |
</pre> |
||
</pre> |
|||
=={{header|Phix}}== |
|||
A dictionary is used to find the index of already-seen n-grams.<br> |
|||
I have replicated most orderings found on this page, the task description order corresponds to orig/freq,<br> |
|||
jq is alpha/freq but high last, but there is no equivalent for the Factor or Raku orderings here ;-). |
|||
<!--<syntaxhighlight lang="phix">(phixonline)--> |
|||
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> |
|||
<span style="color: #008080;">function</span> <span style="color: #000000;">n_grams</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">len</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">string</span> <span style="color: #000000;">txt</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">orders</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #004080;">sequence</span> <span style="color: #000000;">ng</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{},</span> <span style="color: #000000;">ngc</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span> |
|||
<span style="color: #004080;">integer</span> <span style="color: #000000;">d</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">new_dict</span><span style="color: #0000FF;">()</span> |
|||
<span style="color: #000000;">txt</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">upper</span><span style="color: #0000FF;">(</span><span style="color: #000000;">txt</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">txt</span><span style="color: #0000FF;">)-</span><span style="color: #000000;">len</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span> <span style="color: #008080;">do</span> |
|||
<span style="color: #004080;">string</span> <span style="color: #000000;">tn</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">txt</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">..</span><span style="color: #000000;">i</span><span style="color: #0000FF;">+</span><span style="color: #000000;">len</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> |
|||
<span style="color: #004080;">integer</span> <span style="color: #000000;">ndx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">getdd</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #008080;">if</span> <span style="color: #000000;">ndx</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> |
|||
<span style="color: #000000;">ng</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">,</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #000000;">ngc</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ngc</span><span style="color: #0000FF;">,</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #000000;">ndx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #7060A8;">setd</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ndx</span><span style="color: #0000FF;">,</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #008080;">else</span> |
|||
<span style="color: #000000;">ngc</span><span style="color: #0000FF;">[</span><span style="color: #000000;">ndx</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span> |
|||
<span style="color: #7060A8;">destroy_dict</span><span style="color: #0000FF;">(</span><span style="color: #000000;">d</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #004080;">integer</span> <span style="color: #000000;">l</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #004080;">sequence</span> <span style="color: #000000;">ares</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">columnize</span><span style="color: #0000FF;">({</span><span style="color: #000000;">ng</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ngc</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">l</span><span style="color: #0000FF;">)}),</span> |
|||
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span> |
|||
<span style="color: #008080;">for</span> <span style="color: #000000;">c</span> <span style="color: #008080;">in</span> <span style="color: #000000;">orders</span> <span style="color: #008080;">do</span> |
|||
<span style="color: #008080;">if</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"original"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- original/first found order</span> |
|||
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"orig/freq"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- "" but higher freq first</span> |
|||
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort_columns</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">,{-</span><span style="color: #000000;">2</span><span style="color: #0000FF;">,</span><span style="color: #000000;">3</span><span style="color: #0000FF;">}))</span> |
|||
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"alphabetic"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- alphabetical order</span> |
|||
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">deep_copy</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">)))</span> |
|||
<span style="color: #008080;">elsif</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #008000;">"alpha/freq"</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- "" but higher freq first</span> |
|||
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sort_columns</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ares</span><span style="color: #0000FF;">,{-</span><span style="color: #000000;">2</span><span style="color: #0000FF;">,</span><span style="color: #000000;">1</span><span style="color: #0000FF;">}))</span> |
|||
<span style="color: #008080;">else</span> |
|||
<span style="color: #0000FF;">?</span><span style="color: #000000;">9</span><span style="color: #0000FF;">/</span><span style="color: #000000;">0</span> <span style="color: #000080;font-style:italic;">-- (unknown ordering requested)</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span> |
|||
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span> |
|||
<span style="color: #008080;">constant</span> <span style="color: #000000;">src</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"Live and let live"</span><span style="color: #0000FF;">,</span> |
|||
<span style="color: #000000;">orders</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"original"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"orig/freq"</span><span style="color: #0000FF;">,</span> |
|||
<span style="color: #008000;">"alphabetic"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"alpha/freq"</span><span style="color: #0000FF;">}</span> |
|||
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"For \"%s\":\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">src</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #008080;">for</span> <span style="color: #000000;">l</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #000000;">4</span> <span style="color: #008080;">do</span> |
|||
<span style="color: #004080;">sequence</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">n_grams</span><span style="color: #0000FF;">(</span><span style="color: #000000;">l</span><span style="color: #0000FF;">,</span><span style="color: #000000;">src</span><span style="color: #0000FF;">,</span><span style="color: #000000;">orders</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #004080;">string</span> <span style="color: #000000;">count</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">ordinal</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]),</span><span style="color: #004600;">true</span><span style="color: #0000FF;">)</span> |
|||
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"There are %s unique %d-grams:\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">count</span><span style="color: #0000FF;">,</span><span style="color: #000000;">l</span><span style="color: #0000FF;">})</span> |
|||
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">,</span><span style="color: #000000;">r</span> <span style="color: #008080;">in</span> <span style="color: #000000;">res</span> <span style="color: #008080;">do</span> |
|||
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%12s: %s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">orders</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #7060A8;">join</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">,</span><span style="color: #008000;">", "</span><span style="color: #0000FF;">,</span><span style="color: #000000;">fmt</span><span style="color: #0000FF;">:=</span><span style="color: #008000;">"%s %d"</span><span style="color: #0000FF;">)})</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span> |
|||
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span> |
|||
<!--</syntaxhighlight>--> |
|||
{{out}} |
|||
<pre> |
|||
For "Live and let live": |
|||
There are twelve unique 2-grams: |
|||
original: LI 2, IV 2, VE 2, E 1, A 1, AN 1, ND 1, D 1, L 2, LE 1, ET 1, T 1 |
|||
orig/freq: LI 2, IV 2, VE 2, L 2, E 1, A 1, AN 1, ND 1, D 1, LE 1, ET 1, T 1 |
|||
alphabetic: A 1, L 2, AN 1, D 1, E 1, ET 1, IV 2, LE 1, LI 2, ND 1, T 1, VE 2 |
|||
alpha/freq: L 2, IV 2, LI 2, VE 2, A 1, AN 1, D 1, E 1, ET 1, LE 1, ND 1, T 1 |
|||
There are thirteen unique 3-grams: |
|||
original: LIV 2, IVE 2, VE 1, E A 1, AN 1, AND 1, ND 1, D L 1, LE 1, LET 1, ET 1, T L 1, LI 1 |
|||
orig/freq: LIV 2, IVE 2, VE 1, E A 1, AN 1, AND 1, ND 1, D L 1, LE 1, LET 1, ET 1, T L 1, LI 1 |
|||
alphabetic: AN 1, LE 1, LI 1, AND 1, D L 1, E A 1, ET 1, IVE 2, LET 1, LIV 2, ND 1, T L 1, VE 1 |
|||
alpha/freq: IVE 2, LIV 2, AN 1, LE 1, LI 1, AND 1, D L 1, E A 1, ET 1, LET 1, ND 1, T L 1, VE 1 |
|||
There are thirteen unique 4-grams: |
|||
original: LIVE 2, IVE 1, VE A 1, E AN 1, AND 1, AND 1, ND L 1, D LE 1, LET 1, LET 1, ET L 1, T LI 1, LIV 1 |
|||
orig/freq: LIVE 2, IVE 1, VE A 1, E AN 1, AND 1, AND 1, ND L 1, D LE 1, LET 1, LET 1, ET L 1, T LI 1, LIV 1 |
|||
alphabetic: AND 1, LET 1, LIV 1, AND 1, D LE 1, E AN 1, ET L 1, IVE 1, LET 1, LIVE 2, ND L 1, T LI 1, VE A 1 |
|||
alpha/freq: LIVE 2, AND 1, LET 1, LIV 1, AND 1, D LE 1, E AN 1, ET L 1, IVE 1, LET 1, ND L 1, T LI 1, VE A 1 |
|||
</pre> |
</pre> |
||