What is wrong with the following code? def join(*iterable): s = "" for e in iterable: s = s + e return s What if that code could run as fast as ''.join(iterable)? How much existing code would start to perform better? In particular, would XML code laden with += loops run faster? Would Python be easier to teach and write? Would the resulting code look more natural and obvious? Without expanding the size of the string structure, I think it is possible to implement something equivalent to the following proposal (which for clarity, does add two words to the structure). Roughly, the idea is: Add two PyObject pointers to the structure, component0 and component1, and initialize them to NULL. Alter string_concat(a, b) to return a new string object with: ob_sval = '\0' ob_size = len(a) + len(b) component0 = a // be sure to INCREF(a) component1 = b // be sure to INCREF(b) To the start of every string method except string_len() and string_concat(), add a macro that expands to: if (self->component0 != NULL) _autojoin(self); Write an _autojoin(self) method who's goal in life is to recursively fill-out the string from the components: * resize self to be big enough for an ob_sized string * treat component0 and component1 as a binary tree (with only the leaves holding real strings) and recursively (in-order) find the leaves and copy their string data into self->ob_sval. DECREF each component after traversing it. * set self->component0 to NULL to mark self as being fully joined. The recursion depth should not be worrisome. In the typical case, one side of the tree will be a leaf and will not recurse further. It gets more interesting if the user writes things like b=a+a or has a nested series like c=a+b; f=d+e; g=c+f. The recursion and reference counting trivializes the handling of those cases. An actual implementation striving to save the two additional structure fields could use the ob_sstate field to indicate a deferred string (my name for a string that has not yet had ob_sval filled out). The ob_sval array itself would be given room to hold two object pointers for the components and access to those pointers would be cast to (PyObject *). If this works out, it would be an innovation. The join([a,b,c,d]) versus a+b+c+d problem also exists in several other languages. AFAICT, Python would have been the first to solve it. As a nice side benefit, __builtin__.sum() would no longer need to reject the case where the inputs were strings. There is a proof-of-concept patch for this at: www.python.org/sf/976162 Using UserString, the patch demonstrates a simplified implementation that converts the O(n**2) string addition process into an O(n) process like str.join(). Since the patch passes the test suite, it seems likely that the above could be implemented in a way that is transparent to the user. Raymond Hettinger P.S. This is a repeat post. I don't think the first one made it through mailman this week.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4