Textonyms: Difference between revisions

Content added Content deleted

Inline

@@ Line 694: / Line 694: @@
 26456746242 maps to: ["Anglophobia","Anglophobic"]
 24636272673 maps to: ["CinemaScope","Cinemascope"]</lang>
+=={{header|Julia}}==
+This solution uses an <tt>aspell</tt> dictionary on the local machine as its word source.  The character to number mapping is done via regular expressions and Julia's <tt>replace</tt> function.  Because this list contains accented characters, the matching expressions were expanded to include such characters.  Words are case sensitive, but the mapping is not, so for example both "Homer" and "homer" are included in the tabulation, each coded as "46637".
+'''Function'''
+<lang Julia>
+const tcode = (Regex=>Char)[r"A|B|C|Ä|Å|Á|Â|Ç" => '2',
+                            r"D|E|F|È|Ê|É" => '3',
+                            r"G|H|I|Í" => '4',
+                            r"J|K|L" => '5',
+                            r"M|N|O|Ó|Ö|Ô|Ñ" => '6',
+                            r"P|Q|R|S" => '7',
+                            r"T|U|V|Û|Ü" => '8',
+                            r"W|X|Y|Z" => '9']
+function tpad(str::IOStream)
+    tnym = (String=>Array{String,1})[]
+    for w in eachline(str)
+        w = chomp(w)
+        t = uppercase(w)
+        for (k,v) in tcode
+            t = replace(t, k, v)
+        end
+        t = replace(t, r"\D", '1')
+        tnym[t] = [get(tnym, t, String[]), w]
+    end
+    return tnym
+end
+</lang>
+'''Main'''
+<lang Julia>
+dname = "/usr/share/dict/american-english"
+DF = open(dname, "r")
+tnym = tpad(DF)
+close(DF)
+println("The character to digit mapping is done according to")
+println("these regular expressions (following uppercase conversion):")
+for k in sort(collect(keys(tcode)), by=x->tcode[x])
+    println("    ", tcode[k], " -> ", k)
+end
+println("Unmatched non-digit characters are mapped to 1")
+println()
+print("There are ", sum(map(x->length(x), values(tnym))))
+println(" words in ", dname)
+println("  which can be represented by the digit key mapping.")
+print("They require ", length(keys(tnym)))
+println(" digit combinations to represent them.")
+print(sum(map(x->length(x)>1, values(tnym))))
+println(" digit combinations represent Textonyms.")
+println()
+println("The degeneracies of telephone key encodings are:")
+println("  Words Encoded   Number of codes")
+dgen = zeros(maximum(map(x->length(x), values(tnym))))
+for v in values(tnym)
+    dgen[length(v)] += 1
+end
+for (i, d) in enumerate(dgen)
+    println(@sprintf "%10d  %15d" i d)
+end
+println()
+dgen = length(dgen) - 2
+println("Codes mapping to ", dgen, " or more words:")
+for (k, v) in tnym
+    dgen <= length(v) || continue
+    println(@sprintf "%7s (%2d) %s" k length(v) join(v, ", "))
+end
+</lang>
+{{out}}
+<pre>
+The character to digit mapping is done according to
+these regular expressions (following uppercase conversion):
+-> r"A|B|C|Ä|Å|Á|Â|Ç"
+-> r"D|E|F|È|Ê|É"
+-> r"G|H|I|Í"
+-> r"J|K|L"
+-> r"M|N|O|Ó|Ö|Ô|Ñ"
+-> r"P|Q|R|S"
+-> r"T|U|V|Û|Ü"
+-> r"W|X|Y|Z"
+Unmatched non-digit characters are mapped to 1
+There are 99171 words in /usr/share/dict/american-english
+  which can be represented by the digit key mapping.
+They require 89353 digit combinations to represent them.
+digit combinations represent Textonyms.
+The degeneracies of telephone key encodings are:
+  Words Encoded   Number of codes
+82493
+5088
+1104
+383
+159
+72
+24
+16
+8
+4
+1
+1
+Codes mapping to 10 or more words:
+(11) Amy, BMW, Cox, Coy, any, bow, box, boy, cow, cox, coy
+(12) acres, bards, barer, bares, barfs, baser, bases, caper, capes, cards, cares, cases
+(10) Case, acre, bard, bare, barf, base, cape, card, care, case
+(10) Homer, goner, goods, goofs, homer, homes, hones, hoods, hoofs, inner
+(10) PA's, PC's, Pa's, Pb's, Ra's, Rb's, SC's, Sb's, Sc's, pa's
+(10) GE's, Gd's, Ge's, HF's, He's, Hf's, ID's, he's, id's, if's
+</pre>
 =={{header|Perl}}==