Nick Coghlan wrote: > Tim Peters wrote: > >> I needed a break from intractable database problems, and am almost >> done with PyUnicode_Join(). I'm not doing auto-unicode(), though, so >> there will still be plenty of fun left for Nick! > > > I actually got that mostly working (off slightly out-of-date CVS though). > > Joining a sequence of 10 integers with auto-str seems to take about 60% > of the time of a str(x) list comprehension on that same sequence (and > the PySequence_Fast call means that a generator is slightly slower than > a list comp!). For a sequence which mixed strings and non-strings, the > gains could only increase. > > However, there is one somewhat curly problem I'm not sure what to do about. > > To avoid slowing down the common case of string join (a list of only > strings) it is necessary to do the promotion to string in the type-check > & size-calculation pass. > > That's fine in the case of a list that consists of only strings and > non-basestrings, or the case of a unicode separator - every > non-basestring is converted using either PyObject_Str or PyObject_Unicode. > > Where it gets weird is something like this: > ''.join([an_int, a_unicode_str]) > u''.join([an_int, a_unicode_str]) This gives you a TypeError, so it's a non-issue (.join() does not do an implicit call to str(obj) on the list elements). The real issue is the case where you have [a_str, a_unicode_obj] and for that the current implementation already does the right thing, namely to look for Unicode objects in the length checking pass. > In the first case, the int will first be converted to a string via > PyObject_Str, and then that string representation is what will get > converted to Unicode after the detection of the unicode string causes > the join to be handed over to Unicode join. > > In the latter case, the int is converted directly to Unicode. > > So my question would be, is it reasonable to expect that > PyObject_Unicode(PyObject_Str(some_object)) give the same answer as > PyObject_Unicode(some_object)? > > If not, then the string join would have to do something whereby it kept > a 'pristine' version of the sequence around to hand over to the Unicode > join. > > My first attempt at implementing this feature had that property, but > also had the effect of introducing about a 1% slowdown of the standard > sequence-of-strings case (it introduced an extra if statement to see if > a 'stringisation' pass was required after the initial type checking and > sizing pass). For longer sequences than 10 strings, I imagine the > relative slowdown would be much less. > > Hmm. . . I think I see a way to implement this, while still avoiding > adding any code to the standard path through the function. It'd be > slower for the case where an iterator is passed in, and we automatically > invoke PyObject_Str but don't end up delegating to Unicode join, though, > as it involves making a copy of the sequence that only gets used if the > Unicode join is invoked. (If the original object is a real sequence, > rather than an iterator, there is no extra overhead - we have to make > the copy anyway, to avoid mutating the user's sequence). > > If people are definitely interested in this feature, I could probably > put a patch together next week. > > Regards, > Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/mal%40egenix.com -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 27 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4