On Sunday 26 October 2003 04:26, Guido van Rossum wrote: ... > > I'm just very sad that I didn't think of this performance-trap back > > when the specs of sum were first being defined. Oh well:-(. > > Oh, but we all *did* think of it. For strings. :-) Yeah, and your decision to forbid them (while my first prototype tried forwarding to ''.join) simplified sum's implementation a lot. Unfortunately we cannot easily distinguish numbers from sequences in the general case, when user-coded classes are in play; so we can't easily forbid sequences and allow numbers. Exactly the same underlying reason as a bug I just opened on SF: if x is an instance of a class X having __mul__ but not __rmul__, 3*x works (just like x*3) but 3.0*x raises TypeError (with a message that shows the problem -- x is being taken as a sequence). When X is intended as a number class, this asymmetry between multiplication and (e.g.) addition violates the principle of least surprise. > > Can I at least add > > a warning about this performance trap to the docs at > > http://www.python.org/doc/current/lib/built-in-funcs.html ? > > Definitely. > > You know, I don't even think that I would consider using sum() if I > wanted to concatenate a bunch of lists. Let's use sum() for numbers. > Big deal. Currently the docs say that sum is mostly meant for numbers. By making that observation into a stronger warning, we can at least be somewhat helpful to those few Python users who read the manuals;-). If sum just couldn't be used for a list of lists it would indeed not be a big problem. The problem is that it can, it's just (unexpectedly for the naive user) dog-slow, just like a loop of += on a list of strings. And people WILL and DO consider and try and practice _any_ use for a language or library feature. The problem of the += loop on strings is essentially solved by psyco, which has tricks to catch that and make it almost as fast as ''.join; but psyco can't get into a built-in function such as sum, and thus can't help us with the performance trap there. As you've indicated that for 2.4 the risk of semantics changes to sum in weird cases _can_ be at least considered (you're still opposed but open to perhaps being convinced) I hope to get something for that (with a copy.copy of the "accumulator" and in-place addition if that succeeds, falling back to plain addition otherwise) and all the unit tests needed to show it makes sense. An aside...: One common subtheme of this and other recent threads here and on c.l.py is that, as we think of "accumulator functions" to consume iterators, we should not ignore the mutating methods (typically returning None) that are NOT appropriate for list comprehensions just as they weren't for map and friends. A substantial minority of intermediate Python users, knowing or feeling that loops coded in Python aren't as fast as those that happen inside C-coded funcs such as sum, those in itertools, etc, is NOT enthusiastic about coding e.g. "for x in stuff: tot += x". Most often their performance focus is of course inappropriate, but it's hard to uproot it. So, in a typical example, we might have: L = [ [x] for x in xrange(1000) ] def aloop(L=L): tot = [] for x in L: tot += x return tot def asum(L=L): return sum(L, []) def amap(L=L): tot = [] map(tot.extend, L) return tot With the now-regressed fix, this gave: [alex at lancelot bo]$ timeit.py -c -s'import d' 'd.aloop()' 1000 loops, best of 3: 640 usec per loop [alex at lancelot bo]$ timeit.py -c -s'import d' 'd.asum()' 1000 loops, best of 3: 480 usec per loop [alex at lancelot bo]$ timeit.py -c -s'import d' 'd.amap()' 1000 loops, best of 3: 790 usec per loop so sum could be touted as "the only obvious solution" -- shortest, neatest, fastest... IF it were indeed fast!-) Unfortunately, with the sum change regressed, d.asum times to 8.4e+03 usec per loop, so it clearly cannot be considered any more:-). So, there might be space for an accumulator function patterned on map but [a] which stops on the shortest sequence like zip and [b] does NOT build a list of results, meant to be called a bit like map is in the 'amap' example above. itertools is a great little collection of producers and manipulators of iterators, but the "accumulator functions" might provide the "one obvious way" to _consume_ iterators for common cases; and accumulating by calling an accumulator-object's mutator method, such as tot.extend above, on all items of an iterator, clearly is pretty common. Alex
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4