Anagram generator: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|Wren}}: Replaced with much more efficient version, more than 100x quicker than before.)
Line 256: Line 256:
{{libheader|Wren-str}}
{{libheader|Wren-str}}
{{libheader|Wren-perm}}
{{libheader|Wren-perm}}
{{libheader|Wren-seq}}
{{libheader|Wren-sort}}
{{libheader|Wren-sort}}
To avoid any subjectivity, this just produces all two word anagrams of a word or phrase.
Although reasonably thorough (at least for producing two word anagrams), this is none too quick when there's more than 9 letters to juggle with. Hence, the need for a limit to be imposed on the number of anagrams produced.

Alternatives formed by simply changing the order of the two words have been suppressed.
<lang ecmascript>import "io" for File
<lang ecmascript>import "io" for File
import "./str" for Str, Char
import "./str" for Str, Char
import "./perm" for Perm
import "./perm" for Comb
import "./sort" for Find
import "./seq" for Lst
import "./sort" for Sort


var wordList = "unixdict.txt" // local copy
var wordList = "unixdict.txt" // local copy
var words = File.read(wordList).trimEnd().split("\n").toList
var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() }
var wordMap = {}
for (word in words) {
var letters = word.toList
Sort.insertion(letters)
var sortedWord = letters.join()
if (wordMap.containsKey(sortedWord)) {
wordMap[sortedWord].add(word)
} else {
wordMap[sortedWord] = [word]
}
}


var anagramGenerator = Fn.new { |text, limit|
var anagramGenerator = Fn.new { |text|
var letters = Str.lower(text).toList
var letters = Str.lower(text).toList
// remove any non-letters
// remove any non-letters
Line 272: Line 287:
if (!Char.isLetter(letters[i])) letters.removeAt(i)
if (!Char.isLetter(letters[i])) letters.removeAt(i)
}
}
if (letters.count < 4) return
var lc = letters.count
var h = (letters.count/2).floor
if (lc < 2) return
var count = 0
var h = (lc/2).floor
var tried = {}
var tried = {}
for (n in h..2) {
for (n in h..1) {
for (perm in Perm.list(letters)) {
var sameLength = (lc == 2 * n)
var letters1 = perm[0...n]
for (letters1 in Comb.list(letters, n)) {
for (perm2 in Perm.list(letters1)) {
Sort.insertion(letters1)
var word1 = perm2.join()
letters1 = letters1.join()
if (tried[word1]) continue
if (tried[letters1]) continue
tried[word1] = true
tried[letters1] = true
if (Find.first(words, word1) >= 0) {
var anagrams = wordMap[letters1]
var letters2 = perm[n..-1]
if (anagrams) {
for (perm3 in Perm.list(letters2)) {
var letters2 = Lst.except(letters, letters1.toList)
var word2 = perm3.join()
Sort.insertion(letters2)
if (tried[word2]) continue
letters2 = letters2.join()
tried[word2] = true
if (sameLength) {
if (Find.first(words, word2) >= 0) {
if (tried[letters2]) continue
tried[letters2] = true
}
var anagrams2 = wordMap[letters2]
if (anagrams2) {
for (word1 in anagrams) {
for (word2 in anagrams2) {
System.print(" " + word1 + " " + word2)
System.print(" " + word1 + " " + word2)
count = count + 1
if (count == limit) return
}
}
}
}
Line 301: Line 320:
}
}


var tests = ["Rosetta", "PureFox", "Petelomax", "Wherrera", "Thundergnat"]
var tests = ["Rosettacode", "PureFox", "Petelomax", "Wherrera", "Thundergnat", "ClintEastwood"]
var limits = [10, 10, 10, 10, 1]
for (i in 0...tests.count) {
for (i in 0...tests.count) {
System.print("\n%(tests[i])(<=%(limits[i])):")
System.print("\n%(tests[i]):")
anagramGenerator.call(tests[i], limits[i])
anagramGenerator.call(tests[i])
}</lang>
}</lang>


{{out}}
{{out}}
<pre>
<pre>
Rosettacode:
Rosetta(<=10):
rot east
scoot derate
rot seat
stood cetera
oar test
stood create
ret taos
tease doctor
toe star
code rosetta
toe tsar
coed rosetta
ott sera
coat oersted
ott sear
coda rosette
ott ares
sao detector
oat rest
tee ostracod
tad creosote
se doctorate


PureFox(<=10):
PureFox:
fox peru
fox peru
fox pure
fox pure


Petelomax(<=10):
Petelomax:
poem latex
poem exalt
poem exalt
poem latex
apex motel
apex motel
alex tempo
alex tempo
axle tempo
atom expel
moat expel
moat expel
pax omelet
pax omelet
Line 336: Line 358:
to example
to example


Wherrera(<=10):
Wherrera:
wehr rear
wehr rare
wehr rare
wehr rear
ware herr
wear herr
wear herr


Thundergnat(<=1):
Thundergnat:
ghent tundra
ghent tundra
hunt dragnet
gnat thunder
tang thunder
hurd gannett
hurd tangent

ClintEastwood:
edison walcott
atwood stencil
clint eastwood
eliot downcast
clio downstate
coil downstate
loci downstate
</pre>
</pre>

Revision as of 09:52, 10 July 2022

Anagram generator is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

There are already other tasks relating to finding existing anagrams. This one is about creating them.

Write a (set of) routine(s) that, when given a word list to work from, and word or phrase as a seed, generates anagrams of that word or phrase. Feel free to ignore letter case, white-space, punctuation and symbols. Probably best to avoid numerics too, but feel free to include them if that floats your boat.

It is not necessary to (only) generate anagrams that make sense. That is a hard problem, much more difficult than can realistically be done in a small program; though again, if you feel the need, you are invited to amaze your peers.

In general, try to form phrases made up of longer words. Feel free to manually reorder output words or add punctuation and/or case changes to get a better meaning.


Task

Write an anagram generator program.

Use a publicly and freely available word file as its word list.

unixdict.txt from http://wiki.puzzlers.org is a popular, though somewhat limited choice.
A much larger word list: words_alpha.txt file from https://github.com/dwyl/english-words. May be better as far as coverage but may return unreasonably large results.

Use your program to generate anagrams of some words / phrases / names of your choice. No need to show all the output. It is likely to be very large. Just pick out one or two of the best results and show the seed word/phrase and anagram.

For example, show the seed and one or two of the best anagrams:

Purefox -> Fur expo
Petelomax -> Metal expo

.oO(hmmm. Seem to be detecting something of a trend here...)


J

Implementation:

<lang J>anagen=: {{

 seed=. (tolower y)([-.-.)a.{~97+i.26
 letters=. ~.seed
 list=. <;._2 tolower fread x
 ok1=. */@e.&letters every list
 ref=. #/.~seed
 counts=. <: #/.~@(letters,])every ok1#list
 ok2=. counts */ .<:ref
 c=. ok2#counts
 maybe=. i.1,~#c
 while. #maybe do.
   done=. (+/"2 maybe{c)*/ .=ref
   if. 1 e. done do.
     r=. ;:inv ((done#maybe) { ok2#I.ok1){L:0 1 <;._2 fread x
     if. #r=. r #~ -. r -:"1&tolower y do. r return. end.
   end.
   maybe=. ; c {{
     <(#~ n */ .<:"1~ [: +/"2 {&m) y,"1 0 ({:y)}.i.#m
   }} ref"1(-.done)#maybe
 end.
 EMPTY

}}</lang>

Examples:

<lang J> 'unixdict.txt' anagen 'Rosettacode' cetera stood coat oersted coda rosette code rosetta coed rosetta create stood creosote tad derate scoot detector sao doctor tease doctorate se ostracod tee

  'unixdict.txt' anagen 'Thundergnat'

dragnet hunt gannett hurd ghent tundra gnat thunder hurd tangent tang thunder

  'unixdict.txt' anagen 'Clint Eastwood'

atwood stencil clio downstate coil downstate downcast eliot downstate loci edison walcott</lang>

Phix

Couldn't really think of a better way than just building a dirty great filter list to get rid of the less interesting answers....

with javascript_semantics
constant bo_ring = {"al","alex","am","an","and","anent","ann","ant","ar","ares","art","at","ax","axle",
                    "dan","dar","darn","dart","de","den","dent","dna","drag","du","dun","dunn",
                    "ed","edt","eh","el","em","en","end","eng","erg","eros","est","et","eta","ex",
                    "ga","gad","gar","garth","ge","ghent","gnat","gnu","grad","gu","ha","had","han",
                    "hand","hart","hat","he","hut","la","lam","lao","lax","lee","leo","lo","lot",
                    "ma","max","mao","mo","moe","mel","met","mt","nat","nd","ne","ned","nh","nne","nu",
                    "opel","opt","ott","ox","pa","pax","pee","pl","pm","po","poe","rag","ran","rand",
                    "rant","rat","rd","re","red","ret","rna","ruth","sa","sat","se","sort","st",
                    "ta","tad","tag","tam","tamp","tao","taos","tan","tang","tangent","tanh",
                    "tar","tat","tater","tau","tax","ted","tel","ten","tenant","tent","tern",
                    "than","that","the","then","tn","tnt","to","top","tor","tort","tot","trag",
                    "tsar","tun","tuna","tung","tx","un","ut","wa"}
function az(string word) return min(word)>='a' and max(word)<='z' and not find(word,bo_ring) end function
sequence words = filter(unix_dict(),az),
         wdsaz = sort(columnize({apply(words,sort),tagset(length(words))}))

sequence seen = {}
procedure test(string w, sequence found={})
    if found={} then
        seen = {}
        printf(1,"%s:\n",{w})
        w = sort(lower(w))
    end if
    for i=abs(binary_search({w[1..1],0},wdsaz)) to length(wdsaz) do
        {string ax, integer wdx} = wdsaz[i]
        if ax[1]!=w[1] then exit end if
        sequence e = tagset(length(w))
        e[1] = 0
        integer j = 2
        for k=2 to length(ax) do
            while j<length(w) and ax[k]>w[j] do j += 1 end while
            if j>length(w) or ax[k]!=w[j] then exit end if
            e[j] = 0
            j += 1
            if k=length(ax) then
                string aw = words[wdx]
                e = filter(e,"!=",0)
                if length(e)=0 then
                    if length(found) then
                        sequence f = append(deep_copy(found),aw),
                                sf = sort(deep_copy(f))
                        if not find(sf,seen) then
                            seen = append(seen,sf)
                            printf(1,"   %s\n",{join(f,", ")})
                        end if
                    end if
                else
                    test(extract(w,e),append(deep_copy(found),aw))
                end if
            end if
        end for
    end for
end procedure
papply({"Rosetta", "PureFox","PeteLomax","Wherrera","Thundergnat"},test)
Output:
Rosetta:
   treat, so
   sea, trot
   east, rot
   seat, rot
   state, or
   taste, or
   oar, test
   oat, rest
   star, toe
   as, otter
PureFox:
   peru, fox
   pure, fox
   rex, of, up
PeteLomax:
   exalt, poem
   latex, poem
   apex, motel
   axe, elm, pot
   axe, let, mop
   axe, me, plot
   atom, expel
   moat, expel
Wherrera:
   rare, wehr
   rear, wehr
   ware, herr
   wear, herr
Thundergnat:
   ad, tenth, rung
   dragnet, hunt
   dang, net, hurt
   hard, gent, nut
   gannett, hurd
   agent, dr, hunt
   hang, tend, rut
   nag, tend, hurt
   nag, thud, rent
   rang, thud, net
   ah, tend, grunt
   ah, dr, gent, nut
   haunt, dr, gent
   tart, dung, hen

Raku

Using the unixdict.txt word file by default.

<lang perl6>unit sub MAIN ($in is copy = , :$dict = 'unixdict.txt');

say 'Enter a word or phrase to be anagramed. (Loading dictionary)' unless $in.chars;

  1. Load the words into a word / Bag hash

my %words = $dict.IO.slurp.lc.words.race.map: { .comb(/\w/).join => .comb(/\w/).Bag };

  1. Declare some globals

my ($phrase, $count, $bag);

loop {

   ($phrase, $count, $bag) = get-phrase;
   find-anagram Hash.new: %words.grep: { .value ⊆ $bag };

}

sub get-phrase {

   my $prompt = $in.chars ?? $in !! prompt "\nword or phrase? (press Enter to quit) ";
   $in = ;
   exit unless $prompt;
   $prompt,
   +$prompt.comb(/\w/),
   $prompt.lc.comb(/\w/).Bag;

}

sub find-anagram (%subset, $phrase is copy = , $last = Inf) {

   my $remain = $bag ∖ $phrase.comb(/\w/).Bag;        # Find the remaining letters
   my %filtered = %subset.grep: { .value ⊆ $remain }; # Find words using the remaining letters
   my $sofar = +$phrase.comb(/\w/);                   # Get the count of the letters used so far
   for %filtered.sort: { -.key.chars, ~.key } {       # Sort by length then alphabetically then iterate
       my $maybe = +.key.comb(/\w/);                  # Get the letter count of the maybe addition
       next if $maybe > $last;                        # Next if it is longer than last - only consider descending length words
       next if $maybe == 1 and $last == 1;            # Only allow one one character word
       next if $count - $sofar - $maybe > $maybe;     # Try to balance word lengths
       if $sofar + $maybe == $count {                 # It's an anagram
           say $phrase ~ ' ' ~ .key and next;         # Display it and move on
       } else {                                       # Not yet a full anagram, recurse
           find-anagram %filtered, $phrase ~ ' ' ~ .key, $maybe;
       }
   }

}</lang>

Truncated to only show the best few as subjectively determined by me:

Punctuation, capitalization and (in some cases) word order manually massaged.

Enter a word or phrase to be anagramed. (Loading dictionary)

word or phrase? (press Enter to quit) Rosettacode
doctor tease

word or phrase? (press Enter to quit) thundergnat
dragnet hunt
Gent? Nah, turd.

word or phrase? (press Enter to quit) Clint Eastwood
downcast eliot
I contest waldo
nose to wildcat

Wren

Library: Wren-str
Library: Wren-perm
Library: Wren-seq
Library: Wren-sort

To avoid any subjectivity, this just produces all two word anagrams of a word or phrase.

Alternatives formed by simply changing the order of the two words have been suppressed. <lang ecmascript>import "io" for File import "./str" for Str, Char import "./perm" for Comb import "./seq" for Lst import "./sort" for Sort

var wordList = "unixdict.txt" // local copy var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() } var wordMap = {} for (word in words) {

   var letters = word.toList
   Sort.insertion(letters)
   var sortedWord = letters.join()
   if (wordMap.containsKey(sortedWord)) {
       wordMap[sortedWord].add(word)
   } else {
       wordMap[sortedWord] = [word]
   }

}

var anagramGenerator = Fn.new { |text|

   var letters = Str.lower(text).toList
   // remove any non-letters
   for (i in letters.count-1..0) {
       if (!Char.isLetter(letters[i])) letters.removeAt(i)
   }
   var lc = letters.count
   if (lc < 2) return
   var h = (lc/2).floor
   var tried = {}
   for (n in h..1) {
       var sameLength = (lc == 2 * n)
       for (letters1 in Comb.list(letters, n)) {
           Sort.insertion(letters1)
           letters1 = letters1.join()
           if (tried[letters1]) continue
           tried[letters1] = true
           var anagrams = wordMap[letters1]
           if (anagrams) {
               var letters2 = Lst.except(letters, letters1.toList)
               Sort.insertion(letters2)
               letters2 = letters2.join()
               if (sameLength) {
                   if (tried[letters2]) continue
                   tried[letters2] = true
               }
               var anagrams2 = wordMap[letters2]
               if (anagrams2) {
                   for (word1 in anagrams) {
                       for (word2 in anagrams2) {
                           System.print("  " + word1 + " " + word2)
                       }
                   }
               }
           }
       }
   }

}

var tests = ["Rosettacode", "PureFox", "Petelomax", "Wherrera", "Thundergnat", "ClintEastwood"] for (i in 0...tests.count) {

   System.print("\n%(tests[i]):")
   anagramGenerator.call(tests[i])

}</lang>

Output:
Rosettacode:
  scoot derate
  stood cetera
  stood create
  tease doctor
  code rosetta
  coed rosetta
  coat oersted
  coda rosette
  sao detector
  tee ostracod
  tad creosote
  se doctorate

PureFox:
  fox peru
  fox pure

Petelomax:
  poem exalt
  poem latex
  apex motel
  alex tempo
  axle tempo
  atom expel
  moat expel
  pax omelet
  lao exempt
  to example

Wherrera:
  wehr rare
  wehr rear
  ware herr
  wear herr

Thundergnat:
  ghent tundra
  hunt dragnet
  gnat thunder
  tang thunder
  hurd gannett
  hurd tangent

ClintEastwood:
  edison walcott
  atwood stencil
  clint eastwood
  eliot downcast
  clio downstate
  coil downstate
  loci downstate