On Sunday 26 October 2003 05:23 pm, Armin Rigo wrote: ... > I must admit I was a bit surprized when I first tested sum(), without first > reading its doc because I thought I knew what it should do. I expected it > to be a fast equivalent to: > > def sum(seq, start=0): > for item in seq: > start = start + seq > return start It IS equivalent to that -- plus an explicit typetest to raise if start is an instance of str or unicode. I had originally tried forwarding to ''.join for strings, but Guido preferred to forbid them, and it still doesn't look like a problem to me. Alas, "fast" is debatable:-). > reduce(operator.add, seq, start) sum doesn't reproduce reduce's quirk of using the first item of seq if start is not given. So, the def version is closer. > I immediately tried it with strings and lists. I immediately thought about > lists because of their use of "+" for concatenation. > > So it seems that neither strings nor lists are properly supported, neither Strings are explicitly disallowed, so that should take care of the surprise factor for that specific case. As for lists, the semantics are right, the speed is not (could be made way faster with modest effort). Same for other mutable sequences. As for tuples and other immutable sequences, they ARE treated exactly like your 'def' above (roughly like your reduce) would treat them -- not very fast, but if all you know about something is that it's an immutable sequence, there's nothing more you can do. The use case of making a huge tuple from many smaller ones seems weird enough that I don't see specialcasing tuples specifically as particularly worthwhile (other immutable sequences that aren't exactly tuples would still suffer, after all). > tuples by the way, and my opinion on this is that it strongly contradicts > the principle of least surprize. For mutable sequences, I agree. For immutable ones, I don't see the performance trap as being a practical problem for tuples (and weirder things) -- it WOULD be for strings, but as we specifically disallow them with a message reminding the user of ''.join, in practice the problem seems minor. Maybe I'm coming to this with a too-pragmatical stance...? > I would not object to an implementation of sum() that special-case lists, > tuples and strings for efficiency. (by which I mean I can contribute a > patch) I think all mutable sequences (that aren't especially weird in their + vs += behavior) might be handled correctly, without specialcasing, roughly as follows (Phillip Eby's idea): def sum(seq, start=0): it = iter(seq) try: result = start + it.next() except StopIteration: return start for item in it: result += item return result my original idea was perhaps a bit goofier, something like: def sum(seq, start=0): try: start = copy.copy(start) except TypeError: for item in seq: start = start + item else: for item in seq: start += item return start Admittedly the latter version may accept a few more cases, e.g. both versions would accept: sum([ range(3), 'foo' ], []) because [] is copyable, []+range(3) is fine, and list.__iadd__ is more permissive than list.__add__; however, the first version would fail on: sum([ 'foo', range(3) ], []) because []+'foo' fails, while the second version would be fine because [] is _still_ copyable and __iadd__ is still permissive:-). So, perhaps, implementing by Phillip's idea would still not reduce the suprise factor enough. Hmmm... > > language or library feature. The problem of the += loop on strings is > > essentially solved by psyco, which has tricks to catch that and make > > it almost as fast as ''.join; but psyco can't get into a built-in > > function such as sum, and thus can't help us with the performance trap > > there. > > Supporing sum() in Psyco is no big issue, and it could help the same way as > it does for str.__add__. (It is not explicitely supported yet, but it > could be added.) Still I believe that getting the complexity right in > CPython is important, when it can be done. Absolutely -- we can't rely on psyco for everything, particularly not for getting the big-O right as opposed to speeding things up by constant multipliers (in part, for example, because psyco doesn't work on Mac's, which are going to be a HUGE part of Python's installed base...). However, I would be happy to "leave it to psyco" for a sum of a large sequence of tuples or other immutable sequences...:-). I just don't think that people in practice _will_ fall into that performance-trap... Alex
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4