User talk:Tonyjv: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 34: Line 34:
:I get it now. Rather than getting the max then plucking those with that length from a list, you calculate the length sort on it and store it, then select from the maximum end of the sorted list until the length is no longer equal to the length of the first. Neat. Thanks for taking the time to explain. --[[User:Paddy3118|Paddy3118]] 21:07, 20 September 2011 (UTC)
:I get it now. Rather than getting the max then plucking those with that length from a list, you calculate the length sort on it and store it, then select from the maximum end of the sorted list until the length is no longer equal to the length of the first. Neat. Thanks for taking the time to explain. --[[User:Paddy3118|Paddy3118]] 21:07, 20 September 2011 (UTC)


:Sorting is slow because it takes O(n log n) time. You are correct that your previous solution also used sorting, by <code>groupby(sorted(...))</code>, and thus also incurred the cost of sorting, and thus was no better. However, the solutions that use sorting (Tonyjv's solution and the other one using groupby) are still inferior to the solutions that use a hash table based dictionary or equivalent (defaultdict, Counter) to group are much faster -- linear time on average -- assuming the hash function is not terrible. --[[User:Spoon!|Spoon!]] 01:43, 21 September 2011 (UTC)
:Sorting is slow because it takes O(n log n) time. You are correct that your previous solution also used sorting, by <code>groupby(sorted(...))</code>, and thus also incurred the cost of sorting, and thus was no better. However, the solutions that use sorting (Tonyjv's solution and the other one using groupby) are still inferior to the solutions that use a hash table based dictionary or equivalent (defaultdict, Counter) to group which are much faster -- linear time on average -- assuming the hash function is not terrible. --[[User:Spoon!|Spoon!]] 01:43, 21 September 2011 (UTC)

Revision as of 01:44, 21 September 2011

Can we talk about your Python edits please?

Hi, I have spent time, and looked at most of your edits and disagreed with them. Is it possible that we could talk about our respective views on your edits, and maybe invite others for their views too so we can reach a consensus?

The site is a collaborative wiki, so it would be great if several people gave their input on this, so please speak up!
Thanks --Paddy3118 21:23, 30 June 2010 (UTC)

Have been busy with DaniWeb (member of month in April) and trying to do some work so Have not checked these pages for ages. Now I am looking into getting involved also with Go language, so I came back to these pages and saw your comment long time ago.

Sort instead of max in Anagrams

Hi Tony, If you only use the max value in your Anagrams#Python entry, why change from using max to using sort? Thanks, --Paddy3118 11:26, 19 September 2011 (UTC)

Because it pays of when looping to give all that length solutions, see the break in the loop and compare it to other solution.--Tonyjv 16:10, 20 September 2011 (UTC)

How is sorting everything for just the largest value 'better' than just explicitely finding the largest? Isn't the latter more clear? --Paddy3118 18:14, 20 September 2011 (UTC)

Because it is not a maximum but all maximums, same way as groupby need sorted input to do the same. We want all cases when the maximum is reached and it is done more economically (around 10% faster in this case in my computer with in place sort) by sorting the anagrams by number of anagrams in first place. Then we can do printing out result very economically without full pass of groups (including taking each length again) --Tonyjv 20:43, 20 September 2011 (UTC)

Other style version I used for comparision goes like this (Python3): <lang python> import urllib.request, itertools import time words = urllib.request.urlopen('http://www.puzzlers.org/pub/wordlists/unixdict.txt').read().split() print('Words ready') t0 = time.clock() anagrams = [list(g) for k,g in itertools.groupby(sorted(words, key=sorted), key=sorted)] count = max(len(ana) for ana in anagrams) for ana in anagrams:

   if len(ana) == count:
       print(ana)

t0 -= time.clock() print('Finished in %f s' % -t0) </lang>

I get it now. Rather than getting the max then plucking those with that length from a list, you calculate the length sort on it and store it, then select from the maximum end of the sorted list until the length is no longer equal to the length of the first. Neat. Thanks for taking the time to explain. --Paddy3118 21:07, 20 September 2011 (UTC)
Sorting is slow because it takes O(n log n) time. You are correct that your previous solution also used sorting, by groupby(sorted(...)), and thus also incurred the cost of sorting, and thus was no better. However, the solutions that use sorting (Tonyjv's solution and the other one using groupby) are still inferior to the solutions that use a hash table based dictionary or equivalent (defaultdict, Counter) to group which are much faster -- linear time on average -- assuming the hash function is not terrible. --Spoon! 01:43, 21 September 2011 (UTC)