User talk:Tonyjv: Difference between revisions

m (Undo revision 121090 by Tonyjv (talk))
 
(6 intermediate revisions by 2 users not shown)
Line 48:
 
:::P.S. I try and not make the assumption that better==faster. There are other considerations to take into account. Having said that, your faster example above has a certain elegance... --[[User:Paddy3118|Paddy3118]] 06:03, 21 September 2011 (UTC)
That will not run in Python 3 as bytes can not be joined, here is my fastest version not using groupby, did not compare with the non-groupby version in the page under discussion (this runs both in Python2 and Python3):
<lang python>
from collections import defaultdict
import time
 
try:
words = urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split()
import urllib.request
words = urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split()
except ImportError:
import urllib
words = urllib.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split()
 
print('Words ready')
 
Line 85 ⟶ 91:
d[key].append(w)
lk = len(d[key])
elifif lk ==< lm:
continue
if lk > lm:
lm = lk
result = [d[key]]
#print('New length: %i (%s)' % (lm, result))
elif lk == lm:
else:
result.append(d[key])
Line 97 ⟶ 107:
 
</lang>
: A few suggestions:
# If you are going to benchmark a lot, pull the dict.txt file to your local harddrive, don't repeatedly urlopen it from remote host.
# My example code runs under python 3.1 just fine, except for the <code>print</code> syntax change.
# Like I said, majority of time is spent sorting letters; once all anagrams are in place, sorting the list or not makes no big difference (your sorted version above is only marginally slower).
:--[[User:Ledrug|Ledrug]] 07:21, 21 September 2011 (UTC)
 
From urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split() it produces bytes, which can not be joined before str, it gives error: TypeError: must be str, not bytes. For my anagrams program, I actually do building anagram synonym dictionary only once, if it is not saved on disc already. I do need to put little sorting effort to make the list length descending order and filtering special character (we'd -> dew for example), allowing the dictionary to be any collection of words one per line, as that works better for multiword anagrams. I only split the words possible to current source words, otherwise I keep the list as string of words. In my humble Sempron PC, my own names anagrams from Finnish dictionary (82229 words, 79248 sorted letter combinations, your program took 0.8 s to find interestingly also 5 word anagram, 5 different ones), takes around 600 ms (python 2.6 and psyco for best results):
 
52 words loaded in 68 ms.
1564 anagrams of
tonyveijalainen found!
Processing took 567 ms.
 
: You are describing a completely different problem. Sorting may help with multi-word anagrams, and should help with data reuse, but for the specific task it does no benefit.
: Incidentally, what do you mean by "5 word anagram"? Here's a 12 from my Finnish dictionary: (12, ['ankarasti', 'ankarista', 'arkistaan', 'karitsana', 'karsintaa', 'karsitaan', 'kitaransa', 'narikasta', 'rakastani', 'rankaista', 'raskainta', 'sarkainta']). If that doesn't count somehow, at least there is: (6, ['painottumassa', 'poistumastaan', 'punoittamassa', 'putoamistansa', 'upottamassani', 'upottamissaan']) --[[User:Ledrug|Ledrug]] 20:21, 21 September 2011 (UTC)
 
I have Finnish dictionary with words in basic forms only, in Linux side I did some processing in Ubuntu installed word file (had to remove \\$ at end and filter repeated words) I got also two 12s: ['rankaista', 'ankarista', 'karitsana', 'karsintaa', 'raskainta', 'kitaransa', 'karsitaan', 'rakastani', 'sarkainta', 'arkistaan', 'narikasta', 'ankarasti']
['nostaa', 'otsaan', 'tasona', 'saaton', 'sotaan', 'sotana', 'sontaa', 'sanota', 'otsana', 'satona']
with my collect version, but it took 9.36 s without psyco (interestingly 9.64 s with psyco) python 2.6.5
Anonymous user