[Walter Dörwald] > I'm working on it, however I discovered that unicode.join() > doesn't optimize this special case: > > s = "foo" > assert "".join([s]) is s > > u = u"foo" > assert u"".join([s]) is s > > The second assertion fails. Well, in that example it *has* to fail, because the input (s) wasn't a unicode string to begin with, but u"".join() must return a unicode string. Maybe you intended to say that assert u"".join([u]) is u fails (which is also true today, but doesn't need to be true tomorrow). > I'd say that this test (joining a one item sequence returns > the item itself) should be removed because it tests an > implementation detail. Neverthess, it's an important pragmatic detail. We should never throw away a test just because rearrangement makes a test less convenient. > I'm not sure, whether the optimization should be added to > unicode.find(). Believing you mean join(), yes. Doing common endcases efficiently in C code is an important quality-of-implementation concern, lest people need to add reams of optimization test-&-branch guesses in their own Python code. For example, the SpamBayes tokenizer has many passes that split input strings on magical separators of one kind or another, pasting the remaining pieces together again via string.join(). It's explicitly noted in the code that special-casing the snot out of "separator wasn't found" in Python is a lot slower than letting string.join(single_element_list) just return the list element, so that simple, uniform Python code works well in all cases. It's expected that *most* of these SB passes won't find the separator they're looking for, and it's important not to make endless copies of unboundedly large strings in the expected case. The more heavily used unicode strings become, the more important that they treat users kindly in such cases too.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4