[Alex] > However, one cosmetic suggestion: for analogy with list.sorted, why > not let the call be spelled as > groupby(sequence, key=keyfunc) > ? > > I realize most itertools take a callable _first_, while, to be able to > name the key-extractor this way, it would have to go second. I still > think it would be nicer, partly because while sequence could not > possibly default, key _could_ -- and its one obvious default is to an > identity (lambda x: x). This would let elimination and/or counting of > adjacent duplicates be expressed smoothly (for counting, it would > help to have an ilen that gives the length of a finite iterable argument, > but worst case one can substitute > def ilen(it): > for i, _ in enumerate(it): pass > return i+1 > or its inline equivalent). Though the argument order makes my stomach churn, the identity function default is quite nice: >>> s = 'abracadabra; >>> # sort s | uniq >>> [k for k, g in groupby(list.sorted(s))] ['a', 'b', 'c', 'd', 'r'] >>> # sort s | uniq -d >>> [k for k, g in groupby(list.sorted('abracadabra')) if ilen(g)>1] ['a', 'b', 'r'] >>> # sort s | uniq -c >>> [(ilen(g), k) for k, g in groupby(list.sorted(s))] [(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')] >>> sort s | uniq -c | sort -rn | head -3 >>> list.sorted([(ilen(g), k) for k, g in groupby(list.sorted(s))], reverse=True)[:3] [(5, 'a'), (2, 'r'), (2, 'b')] > > > While extractor > > > functions can be arbitrarily complex, many only fetch a specific > > > attribute or element number. Alex's high-speed curry suggests that it > > > is possible to create a function maker for fast lookups: > > > > > > students.sort(key=extract('grade')) # key=lambda r:r.grade > > > students.sort(key=extract(2)) # key=lambda r:[2] > > > > Perhaps we could do this by changing list.sort() and groupby() to take > > a string or int as first argument to mean exactly this. For the > > It seems to be that this would be specialcasing things while an extract > function might help in other contexts as well. E.g., itertools has > several > other iterators that take a callable and might use this. > > > But I recommend holding off on this -- the "pure" groupby() has enough > > merit without speed hacks, and I find the clarity it provides more > > important than possible speed gains. I expect that the original, ugly > > I agree that the case for extract is separate from that for groupby > (although > the latter does increase the attractiveness of the former). Yes, it's clearly a separate issue (and icing on the cake). I was thinking extract() would be a nice addition to the operator module where everything is basically a lambda evading speed hack for accessing intrinsic operations: operator.add = lambda x,y: x+y Raymond
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4