Anagram generator: Difference between revisions
Line 136: | Line 136: | ||
piety pizza tram |
piety pizza tram |
||
pizza pyrite tam</lang> |
pizza pyrite tam</lang> |
||
=={{header|Julia}}== |
|||
<lang ruby>const unixwords = split(read("unixdict.txt", String) |> lowercase, r"\s+") |
|||
function findphrases(anastring::AbstractString, choices, sizelong = 4, n_shortpermitted = 1) |
|||
anadict = Dict{Char, Int}() |
|||
for c in lowercase(anastring) |
|||
if 'a' <= c <= 'z' |
|||
anadict[c] = get(anadict, c, 0) + 1 |
|||
end |
|||
end |
|||
phrases = String[] |
|||
function addword(remaining, phrase, numshort) |
|||
for w in unixwords |
|||
len = length(w) |
|||
numshort < 1 && len < sizelong && continue |
|||
any(c -> get(remaining, c, 0) < count(==(c), w), w) && @goto nextword |
|||
cdict = copy(remaining) |
|||
for c in w |
|||
cdict[c] -= 1 |
|||
end |
|||
if all(==(0), values(cdict)) |
|||
return strip(phrase * " " * w) |
|||
elseif (newphrase = addword(cdict, phrase * " " * w, numshort - (len < sizelong))) != nothing |
|||
push!(phrases, newphrase) |
|||
print(length(phrases), "\b\b\b\b\b\b\b\b\b") |
|||
end |
|||
@label nextword |
|||
end |
|||
return nothing |
|||
end |
|||
addword(anadict, "", n_shortpermitted) |
|||
return phrases |
|||
end |
|||
for s in ["Rosetta code", "Joe Biden", "wherrera"] |
|||
println("\nFrom '$s':") |
|||
foreach(println, findphrases(s, unixwords, 4, 0) |> unique |> sort!) |
|||
end |
|||
</lang>{{out}} |
|||
<pre> |
|||
From 'Rosetta code': |
|||
cetera stood |
|||
coat oersted |
|||
coda rosette |
|||
code rosetta |
|||
coed rosetta |
|||
create stood |
|||
derate scoot |
|||
doctor tease |
|||
oersted coat |
|||
rosetta code |
|||
rosette coda |
|||
scoot derate |
|||
stood cetera |
|||
tease doctor |
|||
From 'Joe Biden': |
|||
done jibe |
|||
jibe done |
|||
node jibe |
|||
From 'wherrera': |
|||
herr ware |
|||
rare wehr |
|||
rear wehr |
|||
ware herr |
|||
wear herr |
|||
wehr rare |
|||
</pre> |
|||
=={{header|Phix}}== |
=={{header|Phix}}== |
Revision as of 02:07, 19 July 2022
There are already other tasks relating to finding existing anagrams. This one is about creating them.
Write a (set of) routine(s) that, when given a word list to work from, and word or phrase as a seed, generates anagrams of that word or phrase. Feel free to ignore letter case, white-space, punctuation and symbols. Probably best to avoid numerics too, but feel free to include them if that floats your boat.
It is not necessary to (only) generate anagrams that make sense. That is a hard problem, much more difficult than can realistically be done in a small program; though again, if you feel the need, you are invited to amaze your peers.
In general, try to form phrases made up of longer words. Feel free to manually reorder output words or add punctuation and/or case changes to get a better meaning.
- Task
Write an anagram generator program.
Use a publicly and freely available word file as its word list.
- unixdict.txt from http://wiki.puzzlers.org is a popular, though somewhat limited choice.
- A much larger word list: words_alpha.txt file from https://github.com/dwyl/english-words. May be better as far as coverage but may return unreasonably large results.
Use your program to generate anagrams of some words / phrases / names of your choice. No need to show all the output. It is likely to be very large. Just pick out one or two of the best results and show the seed word/phrase and anagram.
For example, show the seed and one or two of the best anagrams:
- Purefox -> Fur expo
- Petelomax -> Metal expo
.oO(hmmm. Seem to be detecting something of a trend here...)
J
Implementation:
<lang J>anagen=: {{
seed=. (tolower y)([-.-.)a.{~97+i.26 letters=. ~.seed list=. <;._2 tolower fread x ok1=. */@e.&letters every list ref=. #/.~seed counts=. <: #/.~@(letters,])every ok1#list ok2=. counts */ .<:ref c=. ok2#counts maybe=. i.1,~#c while. #maybe do. done=. (+/"2 maybe{c)*/ .=ref if. 1 e. done do. r=. ;:inv ((done#maybe) { ok2#I.ok1){L:0 1 <;._2 fread x if. #r=. r #~ -. r -:"1&tolower y do. r return. end. end. maybe=. ; c {{ <(#~ n */ .<:"1~ [: +/"2 {&m) y,"1 0 ({:y)}.i.#m }} ref"1(-.done)#maybe end. EMPTY
}}</lang>
Conceptually, we're working with a graph here -- given a seed word, we have sequence of letters and a count of each of those letters which we must supply. Given a list of words whose letters do not exceed that count, we extend that list with an additional word, discarding word sequences which go over the letter count. We would repeat as necessary until we satisfy the needed letter counts. Also, we inspect the "final result" and remove our seed word or phrase from it (if it was present -- and if that would have been our only result we would extend our under budget word sequences until they satisfy our letter counts). It seems like two words is usually sufficient here.
To limit our resource consumption, we turn this conceptual graph into a directed graph by requiring that the words in our anagram sequence appear in alphabetic order (we could go back and permute each word sequence if we wanted to pad our results).
We could further improve performance (and typically generate fewer example anagrams) if we required our initial word choice to include "rarely used letters" and then alternated between increasing alphabetic order and decreasing alphabetic order when adding words to the potential anagrams. This approach would reduce the size of intermediate results which would [usually] significantly increase search speed. But it might require backtracking if our initial choices were unfortunate.
Examples:
<lang J> 'unixdict.txt' anagen 'Rosettacode' cetera stood coat oersted coda rosette code rosetta coed rosetta create stood creosote tad derate scoot detector sao doctor tease doctorate se ostracod tee
'unixdict.txt' anagen 'Thundergnat'
dragnet hunt gannett hurd ghent tundra gnat thunder hurd tangent tang thunder
'unixdict.txt' anagen 'Clint Eastwood'
atwood stencil clio downstate coil downstate downcast eliot downstate loci edison walcott
'unixdict.txt' anagen 'Pizza party time'
aim pizza pretty airy pizza tempt amaze ritz tippy ami pizza pretty amity pert pizza amp pizzeria tty armpit pizza yet army petit pizza emit party pizza imp pizza treaty impart pizza yet irma petty pizza item party pizza m patty pizzeria map pizzeria tty marietta z zippy mart piety pizza mary petit pizza mat pizza pyrite mater pity pizza mayer pitt pizza maze trait zippy meaty pizza trip mi piazza pretty mira petty pizza mire patty pizza mite party pizza mitt piazza prey mitt piazza pyre mizar tate zippy mizar teat zippy mizar tippy zeta mt piazza pyrite myra petit pizza pam pizzeria tty party pizza time patty pizza rime petty piazza rim piazza pity term piazza prime tty piazza trim type piety pizza tram pizza pyrite tam</lang>
Julia
<lang ruby>const unixwords = split(read("unixdict.txt", String) |> lowercase, r"\s+")
function findphrases(anastring::AbstractString, choices, sizelong = 4, n_shortpermitted = 1)
anadict = Dict{Char, Int}() for c in lowercase(anastring) if 'a' <= c <= 'z' anadict[c] = get(anadict, c, 0) + 1 end end phrases = String[] function addword(remaining, phrase, numshort) for w in unixwords len = length(w) numshort < 1 && len < sizelong && continue any(c -> get(remaining, c, 0) < count(==(c), w), w) && @goto nextword cdict = copy(remaining) for c in w cdict[c] -= 1 end if all(==(0), values(cdict)) return strip(phrase * " " * w) elseif (newphrase = addword(cdict, phrase * " " * w, numshort - (len < sizelong))) != nothing push!(phrases, newphrase) print(length(phrases), "\b\b\b\b\b\b\b\b\b") end @label nextword end return nothing end addword(anadict, "", n_shortpermitted) return phrases
end
for s in ["Rosetta code", "Joe Biden", "wherrera"]
println("\nFrom '$s':") foreach(println, findphrases(s, unixwords, 4, 0) |> unique |> sort!)
end
</lang>
- Output:
From 'Rosetta code': cetera stood coat oersted coda rosette code rosetta coed rosetta create stood derate scoot doctor tease oersted coat rosetta code rosette coda scoot derate stood cetera tease doctor From 'Joe Biden': done jibe jibe done node jibe From 'wherrera': herr ware rare wehr rear wehr ware herr wear herr wehr rare
Phix
Couldn't really think of a better way than just building a dirty great filter list to get rid of the less interesting answers....
with javascript_semantics constant bo_ring = {"al","alex","am","an","and","anent","ann","ant","ar","ares","art","at","ax","axle", "dan","dar","darn","dart","de","den","dent","dna","drag","du","dun","dunn", "ed","edt","eh","el","em","en","end","eng","erg","eros","est","et","eta","ex", "ga","gad","gar","garth","ge","ghent","gnat","gnu","grad","gu","ha","had","han", "hand","hart","hat","he","hut","la","lam","lao","lax","lee","leo","lo","lot", "ma","max","mao","mo","moe","mel","met","mt","nat","nd","ne","ned","nh","nne","nu", "opel","opt","ott","ox","pa","pax","pee","pl","pm","po","poe","rag","ran","rand", "rant","rat","rd","re","red","ret","rna","ruth","sa","sat","se","sort","st", "ta","tad","tag","tam","tamp","tao","taos","tan","tang","tangent","tanh", "tar","tat","tater","tau","tax","ted","tel","ten","tenant","tent","tern", "than","that","the","then","tn","tnt","to","top","tor","tort","tot","trag", "tsar","tun","tuna","tung","tx","un","ut","wa"} function az(string word) return min(word)>='a' and max(word)<='z' and not find(word,bo_ring) end function sequence words = filter(unix_dict(),az), wdsaz = sort(columnize({apply(words,sort),tagset(length(words))})) sequence seen = {} procedure test(string w, sequence found={}) if found={} then seen = {} printf(1,"%s:\n",{w}) w = sort(lower(w)) end if for i=abs(binary_search({w[1..1],0},wdsaz)) to length(wdsaz) do {string ax, integer wdx} = wdsaz[i] if ax[1]!=w[1] then exit end if sequence e = tagset(length(w)) e[1] = 0 integer j = 2 for k=2 to length(ax) do while j<length(w) and ax[k]>w[j] do j += 1 end while if j>length(w) or ax[k]!=w[j] then exit end if e[j] = 0 j += 1 if k=length(ax) then string aw = words[wdx] e = filter(e,"!=",0) if length(e)=0 then if length(found) then sequence f = append(deep_copy(found),aw), sf = sort(deep_copy(f)) if not find(sf,seen) then seen = append(seen,sf) printf(1," %s\n",{join(f,", ")}) end if end if else test(extract(w,e),append(deep_copy(found),aw)) end if end if end for end for end procedure papply({"Rosetta", "PureFox","PeteLomax","Wherrera","Thundergnat"},test)
- Output:
Rosetta: treat, so sea, trot east, rot seat, rot state, or taste, or oar, test oat, rest star, toe as, otter PureFox: peru, fox pure, fox rex, of, up PeteLomax: exalt, poem latex, poem apex, motel axe, elm, pot axe, let, mop axe, me, plot atom, expel moat, expel Wherrera: rare, wehr rear, wehr ware, herr wear, herr Thundergnat: ad, tenth, rung dragnet, hunt dang, net, hurt hard, gent, nut gannett, hurd agent, dr, hunt hang, tend, rut nag, tend, hurt nag, thud, rent rang, thud, net ah, tend, grunt ah, dr, gent, nut haunt, dr, gent tart, dung, hen
Raku
Using the unixdict.txt word file by default.
<lang perl6>unit sub MAIN ($in is copy = , :$dict = 'unixdict.txt');
say 'Enter a word or phrase to be anagramed. (Loading dictionary)' unless $in.chars;
- Load the words into a word / Bag hash
my %words = $dict.IO.slurp.lc.words.race.map: { .comb(/\w/).join => .comb(/\w/).Bag };
- Declare some globals
my ($phrase, $count, $bag);
loop {
($phrase, $count, $bag) = get-phrase; find-anagram Hash.new: %words.grep: { .value ⊆ $bag };
}
sub get-phrase {
my $prompt = $in.chars ?? $in !! prompt "\nword or phrase? (press Enter to quit) "; $in = ; exit unless $prompt; $prompt, +$prompt.comb(/\w/), $prompt.lc.comb(/\w/).Bag;
}
sub find-anagram (%subset, $phrase is copy = , $last = Inf) {
my $remain = $bag ∖ $phrase.comb(/\w/).Bag; # Find the remaining letters my %filtered = %subset.grep: { .value ⊆ $remain }; # Find words using the remaining letters my $sofar = +$phrase.comb(/\w/); # Get the count of the letters used so far for %filtered.sort: { -.key.chars, ~.key } { # Sort by length then alphabetically then iterate my $maybe = +.key.comb(/\w/); # Get the letter count of the maybe addition next if $maybe > $last; # Next if it is longer than last - only consider descending length words next if $maybe == 1 and $last == 1; # Only allow one one character word next if $count - $sofar - $maybe > $maybe; # Try to balance word lengths if $sofar + $maybe == $count { # It's an anagram say $phrase ~ ' ' ~ .key and next; # Display it and move on } else { # Not yet a full anagram, recurse find-anagram %filtered, $phrase ~ ' ' ~ .key, $maybe; } }
}</lang>
- Truncated to only show the best few as subjectively determined by me:
Punctuation, capitalization and (in some cases) word order manually massaged.
Enter a word or phrase to be anagramed. (Loading dictionary) word or phrase? (press Enter to quit) Rosettacode doctor tease word or phrase? (press Enter to quit) thundergnat dragnet hunt Gent? Nah, turd. word or phrase? (press Enter to quit) Clint Eastwood downcast eliot I contest waldo nose to wildcat
Wren
To avoid any subjectivity, this just produces all two word anagrams of a word or phrase.
Alternatives formed by simply changing the order of the two words have been suppressed. <lang ecmascript>import "io" for File import "./str" for Str, Char import "./perm" for Comb import "./seq" for Lst import "./sort" for Sort
var wordList = "unixdict.txt" // local copy var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() } var wordMap = {} for (word in words) {
var letters = word.toList Sort.insertion(letters) var sortedWord = letters.join() if (wordMap.containsKey(sortedWord)) { wordMap[sortedWord].add(word) } else { wordMap[sortedWord] = [word] }
}
var anagramGenerator = Fn.new { |text|
var letters = Str.lower(text).toList // remove any non-letters for (i in letters.count-1..0) { if (!Char.isLetter(letters[i])) letters.removeAt(i) } var lc = letters.count if (lc < 2) return var h = (lc/2).floor var tried = {} for (n in h..1) { var sameLength = (lc == 2 * n) for (letters1 in Comb.list(letters, n)) { Sort.insertion(letters1) letters1 = letters1.join() if (tried[letters1]) continue tried[letters1] = true var anagrams = wordMap[letters1] if (anagrams) { var letters2 = Lst.except(letters, letters1.toList) Sort.insertion(letters2) letters2 = letters2.join() if (sameLength) { if (tried[letters2]) continue tried[letters2] = true } var anagrams2 = wordMap[letters2] if (anagrams2) { for (word1 in anagrams) { for (word2 in anagrams2) { System.print(" " + word1 + " " + word2) } } } } } }
}
var tests = ["Rosettacode", "PureFox", "Petelomax", "Wherrera", "Thundergnat", "ClintEastwood"] for (i in 0...tests.count) {
System.print("\n%(tests[i]):") anagramGenerator.call(tests[i])
}</lang>
- Output:
Rosettacode: scoot derate stood cetera stood create tease doctor code rosetta coed rosetta coat oersted coda rosette sao detector tee ostracod tad creosote se doctorate PureFox: fox peru fox pure Petelomax: poem exalt poem latex apex motel alex tempo axle tempo atom expel moat expel pax omelet lao exempt to example Wherrera: wehr rare wehr rear ware herr wear herr Thundergnat: ghent tundra hunt dragnet gnat thunder tang thunder hurd gannett hurd tangent ClintEastwood: edison walcott atwood stencil clint eastwood eliot downcast clio downstate coil downstate loci downstate