On Monday 21 April 2003 10:52 am, Alex Martelli wrote: ... > Aye aye, cap'n -- now that youve crossed the i's and dotted the t's > I'll arrange the complete patch with tests and docs and submit it > forthwith. Done -- patch 724936 on SF, assigned to gvanrossum with priority 7 as you said to do for patches meant for 2.3beta1. As I've remarked in the patch's comments, there's something of a performance hit now with sum(manystrings) wrt ''.join(manystrings): [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'sum(L)' 10000 loops, best of 3: 174 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' '"".join(L)' 10000 loops, best of 3: 75 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'reduce(operator.add,L)' 1000 loops, best of 3: 1.35e+03 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'tot=""' 'for it in L: tot+=it' 1000 loops, best of 3: 1.33e+03 usec per loop Nowhere as bad as the unbounded slowdown with operator.add or the equivalent loop, but still, a solid slowdown of a factor of two. Problem is that the argument to sum MIGHT be an iterator (not a "normal" sequence) so sum must save the first item and concat the _PyString_Join of the OTHER items after the first (my unit tests were originally lax and only exercised sum with the argument being a list -- fortunately I beefed up the unit tests as part of preparing the patch for submission, so they caught this, as well as the issue with sum of a sequence mixing unicode and plain string items, which forces sum to use different concatenation code depending on the exact type of the first item...). Reasoning on this, and on "If the implementation is hard to explain, it's a bad idea", I'm starting to doubt my original intuition that "of course" sum should be polymorphic over sequences of any type supporting "+" -- maybe Tim Peters' concept that sum should be restricted to sequences of numbers is sounder -- it's irksome that, of sum's 50 lines of C, 14 should deal with the special case of "sequence of strings" and STILL involve a factor-of-2 performance hit wrt ''.join! However, HOW do we catch attempts to use sum on a sequence of strings while still allowing the use case of a sequence of timedeltas? [maybe a timedelta SHOULD be summable to 0 and we could take the 'summable to 0' as a test of numberhood?-)] I don't know, so, I've submitted the patch as it stands, and I hope somebody can suggest a better solution - I just PRAY that sum won't accept a sequence of strings AND sum them up with + , thus perpetuating the "newbie performance trap" of that idiom!-) Alex
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4