Jump to content

Textonyms: Difference between revisions

4,027 bytes added ,  9 years ago
→‎{{header|Julia}}: A new entry for Julia
(Tcl implementation added)
(→‎{{header|Julia}}: A new entry for Julia)
Line 694:
26456746242 maps to: ["Anglophobia","Anglophobic"]
24636272673 maps to: ["CinemaScope","Cinemascope"]</lang>
 
=={{header|Julia}}==
This solution uses an <tt>aspell</tt> dictionary on the local machine as its word source. The character to number mapping is done via regular expressions and Julia's <tt>replace</tt> function. Because this list contains accented characters, the matching expressions were expanded to include such characters. Words are case sensitive, but the mapping is not, so for example both "Homer" and "homer" are included in the tabulation, each coded as "46637".
 
'''Function'''
<lang Julia>
const tcode = (Regex=>Char)[r"A|B|C|Ä|Å|Á|Â|Ç" => '2',
r"D|E|F|È|Ê|É" => '3',
r"G|H|I|Í" => '4',
r"J|K|L" => '5',
r"M|N|O|Ó|Ö|Ô|Ñ" => '6',
r"P|Q|R|S" => '7',
r"T|U|V|Û|Ü" => '8',
r"W|X|Y|Z" => '9']
 
function tpad(str::IOStream)
tnym = (String=>Array{String,1})[]
for w in eachline(str)
w = chomp(w)
t = uppercase(w)
for (k,v) in tcode
t = replace(t, k, v)
end
t = replace(t, r"\D", '1')
tnym[t] = [get(tnym, t, String[]), w]
end
return tnym
end
</lang>
 
'''Main'''
<lang Julia>
dname = "/usr/share/dict/american-english"
DF = open(dname, "r")
tnym = tpad(DF)
close(DF)
 
println("The character to digit mapping is done according to")
println("these regular expressions (following uppercase conversion):")
for k in sort(collect(keys(tcode)), by=x->tcode[x])
println(" ", tcode[k], " -> ", k)
end
println("Unmatched non-digit characters are mapped to 1")
 
println()
print("There are ", sum(map(x->length(x), values(tnym))))
println(" words in ", dname)
println(" which can be represented by the digit key mapping.")
print("They require ", length(keys(tnym)))
println(" digit combinations to represent them.")
print(sum(map(x->length(x)>1, values(tnym))))
println(" digit combinations represent Textonyms.")
 
println()
println("The degeneracies of telephone key encodings are:")
println(" Words Encoded Number of codes")
dgen = zeros(maximum(map(x->length(x), values(tnym))))
for v in values(tnym)
dgen[length(v)] += 1
end
for (i, d) in enumerate(dgen)
println(@sprintf "%10d %15d" i d)
end
 
println()
dgen = length(dgen) - 2
println("Codes mapping to ", dgen, " or more words:")
for (k, v) in tnym
dgen <= length(v) || continue
println(@sprintf "%7s (%2d) %s" k length(v) join(v, ", "))
end
</lang>
 
{{out}}
<pre>
The character to digit mapping is done according to
these regular expressions (following uppercase conversion):
2 -> r"A|B|C|Ä|Å|Á|Â|Ç"
3 -> r"D|E|F|È|Ê|É"
4 -> r"G|H|I|Í"
5 -> r"J|K|L"
6 -> r"M|N|O|Ó|Ö|Ô|Ñ"
7 -> r"P|Q|R|S"
8 -> r"T|U|V|Û|Ü"
9 -> r"W|X|Y|Z"
Unmatched non-digit characters are mapped to 1
 
There are 99171 words in /usr/share/dict/american-english
which can be represented by the digit key mapping.
They require 89353 digit combinations to represent them.
6860 digit combinations represent Textonyms.
 
The degeneracies of telephone key encodings are:
Words Encoded Number of codes
1 82493
2 5088
3 1104
4 383
5 159
6 72
7 24
8 16
9 8
10 4
11 1
12 1
 
Codes mapping to 10 or more words:
269 (11) Amy, BMW, Cox, Coy, any, bow, box, boy, cow, cox, coy
22737 (12) acres, bards, barer, bares, barfs, baser, bases, caper, capes, cards, cares, cases
2273 (10) Case, acre, bard, bare, barf, base, cape, card, care, case
46637 (10) Homer, goner, goods, goofs, homer, homes, hones, hoods, hoofs, inner
7217 (10) PA's, PC's, Pa's, Pb's, Ra's, Rb's, SC's, Sb's, Sc's, pa's
4317 (10) GE's, Gd's, Ge's, HF's, He's, Hf's, ID's, he's, id's, if's
</pre>
 
=={{header|Perl}}==
Cookies help us deliver our services. By using our services, you agree to our use of cookies.