Textonyms: Difference between revisions
Content added Content deleted
(Tcl implementation added) |
(→{{header|Julia}}: A new entry for Julia) |
||
Line 694: | Line 694: | ||
26456746242 maps to: ["Anglophobia","Anglophobic"] |
26456746242 maps to: ["Anglophobia","Anglophobic"] |
||
24636272673 maps to: ["CinemaScope","Cinemascope"]</lang> |
24636272673 maps to: ["CinemaScope","Cinemascope"]</lang> |
||
=={{header|Julia}}== |
|||
This solution uses an <tt>aspell</tt> dictionary on the local machine as its word source. The character to number mapping is done via regular expressions and Julia's <tt>replace</tt> function. Because this list contains accented characters, the matching expressions were expanded to include such characters. Words are case sensitive, but the mapping is not, so for example both "Homer" and "homer" are included in the tabulation, each coded as "46637". |
|||
'''Function''' |
|||
<lang Julia> |
|||
const tcode = (Regex=>Char)[r"A|B|C|Ä|Å|Á|Â|Ç" => '2', |
|||
r"D|E|F|È|Ê|É" => '3', |
|||
r"G|H|I|Í" => '4', |
|||
r"J|K|L" => '5', |
|||
r"M|N|O|Ó|Ö|Ô|Ñ" => '6', |
|||
r"P|Q|R|S" => '7', |
|||
r"T|U|V|Û|Ü" => '8', |
|||
r"W|X|Y|Z" => '9'] |
|||
function tpad(str::IOStream) |
|||
tnym = (String=>Array{String,1})[] |
|||
for w in eachline(str) |
|||
w = chomp(w) |
|||
t = uppercase(w) |
|||
for (k,v) in tcode |
|||
t = replace(t, k, v) |
|||
end |
|||
t = replace(t, r"\D", '1') |
|||
tnym[t] = [get(tnym, t, String[]), w] |
|||
end |
|||
return tnym |
|||
end |
|||
</lang> |
|||
'''Main''' |
|||
<lang Julia> |
|||
dname = "/usr/share/dict/american-english" |
|||
DF = open(dname, "r") |
|||
tnym = tpad(DF) |
|||
close(DF) |
|||
println("The character to digit mapping is done according to") |
|||
println("these regular expressions (following uppercase conversion):") |
|||
for k in sort(collect(keys(tcode)), by=x->tcode[x]) |
|||
println(" ", tcode[k], " -> ", k) |
|||
end |
|||
println("Unmatched non-digit characters are mapped to 1") |
|||
println() |
|||
print("There are ", sum(map(x->length(x), values(tnym)))) |
|||
println(" words in ", dname) |
|||
println(" which can be represented by the digit key mapping.") |
|||
print("They require ", length(keys(tnym))) |
|||
println(" digit combinations to represent them.") |
|||
print(sum(map(x->length(x)>1, values(tnym)))) |
|||
println(" digit combinations represent Textonyms.") |
|||
println() |
|||
println("The degeneracies of telephone key encodings are:") |
|||
println(" Words Encoded Number of codes") |
|||
dgen = zeros(maximum(map(x->length(x), values(tnym)))) |
|||
for v in values(tnym) |
|||
dgen[length(v)] += 1 |
|||
end |
|||
for (i, d) in enumerate(dgen) |
|||
println(@sprintf "%10d %15d" i d) |
|||
end |
|||
println() |
|||
dgen = length(dgen) - 2 |
|||
println("Codes mapping to ", dgen, " or more words:") |
|||
for (k, v) in tnym |
|||
dgen <= length(v) || continue |
|||
println(@sprintf "%7s (%2d) %s" k length(v) join(v, ", ")) |
|||
end |
|||
</lang> |
|||
{{out}} |
|||
<pre> |
|||
The character to digit mapping is done according to |
|||
these regular expressions (following uppercase conversion): |
|||
2 -> r"A|B|C|Ä|Å|Á|Â|Ç" |
|||
3 -> r"D|E|F|È|Ê|É" |
|||
4 -> r"G|H|I|Í" |
|||
5 -> r"J|K|L" |
|||
6 -> r"M|N|O|Ó|Ö|Ô|Ñ" |
|||
7 -> r"P|Q|R|S" |
|||
8 -> r"T|U|V|Û|Ü" |
|||
9 -> r"W|X|Y|Z" |
|||
Unmatched non-digit characters are mapped to 1 |
|||
There are 99171 words in /usr/share/dict/american-english |
|||
which can be represented by the digit key mapping. |
|||
They require 89353 digit combinations to represent them. |
|||
6860 digit combinations represent Textonyms. |
|||
The degeneracies of telephone key encodings are: |
|||
Words Encoded Number of codes |
|||
1 82493 |
|||
2 5088 |
|||
3 1104 |
|||
4 383 |
|||
5 159 |
|||
6 72 |
|||
7 24 |
|||
8 16 |
|||
9 8 |
|||
10 4 |
|||
11 1 |
|||
12 1 |
|||
Codes mapping to 10 or more words: |
|||
269 (11) Amy, BMW, Cox, Coy, any, bow, box, boy, cow, cox, coy |
|||
22737 (12) acres, bards, barer, bares, barfs, baser, bases, caper, capes, cards, cares, cases |
|||
2273 (10) Case, acre, bard, bare, barf, base, cape, card, care, case |
|||
46637 (10) Homer, goner, goods, goofs, homer, homes, hones, hoods, hoofs, inner |
|||
7217 (10) PA's, PC's, Pa's, Pb's, Ra's, Rb's, SC's, Sb's, Sc's, pa's |
|||
4317 (10) GE's, Gd's, Ge's, HF's, He's, Hf's, ID's, he's, id's, if's |
|||
</pre> |
|||
=={{header|Perl}}== |
=={{header|Perl}}== |