Anagrams: Difference between revisions

Line 424:
Read the lines in the dictionary, group by the sorted letters in each word, extract the sequences of words sharing the same letters (i.e. anagrams) and sort to put the largest sets of anagrams first:
<lang fsharp>
let words = System.IO.File.ReadAllLines @"unixdict.txt"
[|>for _, words in Seq.groupBy (SeqArray.sortofSeq >> SeqArray.toArraysort) words -> Array.ofSeq words|]
|> SeqArray.sortBy (fun s -> -Seq.length s)
|> Seq.map snd
|> Seq.sortBy (fun s -> -Seq.length s)
</lang>
Note that it is necessary to convert the sorted letters in each word from sequences to arrays because the groupBy function uses the default comparison and sequences do not compare structurally (but arrays do in F#).
 
Takes under 0.5s to return:
Yields:
<lang fsharp>
val it : seq<seq<string>> [] [] =
[|[|"alger"; "glare"; "lager"; "large"; "regal"|];
seq
[seq [|"abelcaret"; "ablecarte"; "balecater"; "belacrate"; ..."trace"|];
seq [|"algerabel"; "glareable"; "lagerbale"; "largebela"; ..."elba"|];
seq [|"angelelan"; "anglelane"; "galenlean"; "gleanlena"; ..."neal"|];
seq [|"caretangel"; "carteangle"; "catergalen"; "crateglean"; ..."lange"|]; ...]
[|"evil"; "levi"; "live"; "veil"; "vile"|];
[|"aden"; "dane"; "dean"; "edna"|]; [|"emit"; "item"; "mite"; "time"|];
[|"pare"; "pear"; "rape"; "reap"|];
[|"esprit"; "priest"; "sprite"; "stripe"|];
[|"hare"; "hear"; "hera"; "rhea"|]; [|"resin"; "rinse"; "risen"; "siren"|];
[|"keats"; "skate"; "stake"; "steak"|];
[|"lascar"; "rascal"; "sacral"; "scalar"|];
[|"amen"; "mane"; "mean"; "name"|]; [|"nepal"; "panel"; "penal"; "plane"|];
[|"lemon"; "melon"; "menlo"; "monel"|];
[|"least"; "slate"; "stale"; "steal"|]; [|"leap"; "pale"; "peal"; "plea"|];
[|"ames"; "mesa"; "same"; "seam"|]; [|"leapt"; "petal"; "plate"; "pleat"|];
[|"lien"; "line"; "neil"; "nile"|]; [|"abet"; "bate"; "beat"; "beta"|];
[|"mate"; "meat"; "tame"; "team"|]; [|"beard"; "bread"; "debar"; "debra"|];
[|"lament"; "mantel"; "mantle"; "mental"|];
[|"aires"; "aries"; "arise"; "raise"|]; [|"enol"; "leon"; "lone"; "noel"|];
[|"cereus"; "recuse"; "rescue"; "secure"|];
[|"manor"; "moran"; "norma"; "roman"|];
[|"latus"; "sault"; "talus"; "tulsa"|]; [|"diet"; "edit"; "tide"; "tied"|];
[|"lima"; "mail"; "mali"; "mila"|]; [|"are"; "ear"; "era"; "rae"|];
[|"apt"; "pat"; "pta"; "tap"|]; [|"dare"; "dear"; "erda"; "read"|];
[|"ate"; "eat"; "eta"; "tea"|]; [|"ant"; "nat"; "tan"|];
[|"yates"; "yeast"; "yeats"|]; [|"nerve"; "never"; "verne"|];
[|"ether"; "there"; "three"|]; [|"dave"; "vade"; "veda"|];
[|"earn"; "near"; "rena"|]; [|"brag"; "garb"; "grab"|];
[|"magneto"; "megaton"; "montage"|]; [|"earnest"; "eastern"; "nearest"|];
[|"den"; "end"; "ned"|]; [|"lair"; "liar"; "rail"|];
[|"alton"; "talon"; "tonal"|]; [|"now"; "own"; "won"|];
[|"earth"; "hater"; "heart"|]; [|"dire"; "reid"; "ride"|];
[|"bard"; "brad"; "drab"|]; [|"bare"; "bear"; "brae"|];
[|"earthen"; "hearten"; "teheran"|]; [|"kale"; "lake"; "leak"|];
[|"arnold"; "roland"; "ronald"|]; [|"cpu"; "cup"; "puc"|];
[|"earthy"; "hearty"; "thayer"|]; [|"abut"; "tabu"; "tuba"|];
[|"lame"; "male"; "meal"|]; [|"hank"; "kahn"; "khan"|];
[|"riot"; "tori"; "trio"|]; [|"dater"; "trade"; "tread"|];
[|"rite"; "tier"; "tire"|]; [|"army"; "mary"; "myra"|];
[|"alert"; "alter"; "later"|]; [|"acts"; "cast"; "scat"|];
[|"carven"; "cavern"; "craven"|]; [|"grate"; "great"; "greta"|];
[|"rosa"; "soar"; "sora"|]; [|"argot"; "gator"; "groat"|];
[|"demo"; "dome"; "mode"|]; [|"klein"; "kline"; "liken"|];
[|"lisa"; "sail"; "sial"|]; [|"cruel"; "lucre"; "ulcer"|];
[|"along"; "anglo"; "logan"|]; [|"listen"; "silent"; "tinsel"|];
[|"list"; "silt"; "slit"|]; [|"iran"; "nair"; "rain"|];
[|"baird"; "braid"; "rabid"|]; [|"dearth"; "hatred"; "thread"|];
[|"ape"; "epa"; "pea"|]; [|"eros"; "rose"; "sore"|];
[|"ares"; "sear"; "sera"|]; [|"acm"; "cam"; "mac"|];
[|"gas"; "gsa"; "sag"|]; [|"acme"; "came"; "mace"|];
[|"part"; "rapt"; "trap"|]; [|"argon"; "groan"; "organ"|];
[|"ester"; "steer"; "terse"|]; [|"acre"; "care"; "race"|];
[|"brain"; "brian"; "rabin"|]; [|"parse"; "spare"; "spear"|];
[|"bin"; "ibn"; "nib"|]; [|"result"; "rustle"; "ulster"|];
[|"armco"; "macro"; "marco"|]; [|"janos"; "jason"; "jonas"|];
[|"ante"; "nate"; "neat"|]; [|"goer"; "gore"; "ogre"|]; ...|]
</lang>
There is plenty of room for optimization but finding all sets of anagrams in this dictionary takes under 0.5s using this code.
 
== {{header|Factor}} ==
Anonymous user