User talk:Tonyjv: Difference between revisions

← Older edit

User talk:Tonyjv (view source)

Revision as of 08:57, 23 September 2011

3,046 bytes added , 12 years ago

→‎Sort instead of max in Anagrams

Anonymous user

rosettacode>Tonyjv

Revision as of 06:50, 21 September 2011 (view source) rosettacode>Tonyjv m (Undo revision 121090 by Tonyjv (talk)) ← Older edit		Latest revision as of 08:57, 23 September 2011 (view source) rosettacode>Tonyjv (→‎Sort instead of max in Anagrams)
(6 intermediate revisions by 2 users not shown)
Line 48: :::P.S. I try and not make the assumption that better==faster. There are other considerations to take into account. Having said that, your faster example above has a certain elegance... --[[User:Paddy3118\|Paddy3118]] 06:03, 21 September 2011 (UTC) That will not run in Python 3 as bytes can not be joined, here is my fastest version not using groupby, did not compare with the non-groupby version in the page under discussion (this runs both in Python2 and Python3): <lang python> from collections import defaultdict import time try: words = urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split()▼ import urllib.request ▲ words = urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split() except ImportError: import urllib words = urllib.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split() print('Words ready') Line 85 ⟶ 91: d[key].append(w) lk = len(d[key]) ~~elif~~if lk ==< lm:▼ continue if lk > lm: lm = lk result = [d[key]] #print('New length: %i (%s)' % (lm, result)) ▲ elif lk == lm: else: result.append(d[key]) Line 97 ⟶ 107: </lang> : A few suggestions: # If you are going to benchmark a lot, pull the dict.txt file to your local harddrive, don't repeatedly urlopen it from remote host. # My example code runs under python 3.1 just fine, except for the <code>print</code> syntax change. # Like I said, majority of time is spent sorting letters; once all anagrams are in place, sorting the list or not makes no big difference (your sorted version above is only marginally slower). :--[[User:Ledrug\|Ledrug]] 07:21, 21 September 2011 (UTC) From urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split() it produces bytes, which can not be joined before str, it gives error: TypeError: must be str, not bytes. For my anagrams program, I actually do building anagram synonym dictionary only once, if it is not saved on disc already. I do need to put little sorting effort to make the list length descending order and filtering special character (we'd -> dew for example), allowing the dictionary to be any collection of words one per line, as that works better for multiword anagrams. I only split the words possible to current source words, otherwise I keep the list as string of words. In my humble Sempron PC, my own names anagrams from Finnish dictionary (82229 words, 79248 sorted letter combinations, your program took 0.8 s to find interestingly also 5 word anagram, 5 different ones), takes around 600 ms (python 2.6 and psyco for best results): 52 words loaded in 68 ms. 1564 anagrams of tonyveijalainen found! Processing took 567 ms. : You are describing a completely different problem. Sorting may help with multi-word anagrams, and should help with data reuse, but for the specific task it does no benefit. : Incidentally, what do you mean by "5 word anagram"? Here's a 12 from my Finnish dictionary: (12, ['ankarasti', 'ankarista', 'arkistaan', 'karitsana', 'karsintaa', 'karsitaan', 'kitaransa', 'narikasta', 'rakastani', 'rankaista', 'raskainta', 'sarkainta']). If that doesn't count somehow, at least there is: (6, ['painottumassa', 'poistumastaan', 'punoittamassa', 'putoamistansa', 'upottamassani', 'upottamissaan']) --[[User:Ledrug\|Ledrug]] 20:21, 21 September 2011 (UTC) I have Finnish dictionary with words in basic forms only, in Linux side I did some processing in Ubuntu installed word file (had to remove \\$ at end and filter repeated words) I got also two 12s: ['rankaista', 'ankarista', 'karitsana', 'karsintaa', 'raskainta', 'kitaransa', 'karsitaan', 'rakastani', 'sarkainta', 'arkistaan', 'narikasta', 'ankarasti'] ['nostaa', 'otsaan', 'tasona', 'saaton', 'sotaan', 'sotana', 'sontaa', 'sanota', 'otsana', 'satona'] with my collect version, but it took 9.36 s without psyco (interestingly 9.64 s with psyco) python 2.6.5