Guido van Rossum, 26.08.2011 19:02: > On Fri, Aug 26, 2011 at 3:29 AM, Stefan Behnel wrote: >> Besides, what if these implementations provided indexing in, say, O(log N) >> instead of O(1) or O(N), e.g. by building a tree index into each string? You >> could have an index that simply marks runs of surrogate pairs and BMP >> substrings, thus providing a likely-to-be-somewhat-compact index. That index >> would obviously have to be built, but so do the different string >> representations in post-PEP-393 CPython, especially on Windows, as I have >> learned. > > Eek. No, please. I was mostly just confabulating. My main point was that this isn't a black-and-white thing - O(1) xor O(N) - and thus is orthogonal to the PEP. You can achieve compliant/acceptable behaviour at the code point level, the performance guarantees level or the platform integration level - choose any two. CPython is just lucky that there isn't really a platform integration level to take into account (if we leave the Windows environment aside for a moment). > Those platforms' native string types have length and > slicing operations that are O(1) and work in terms of 16-bit code > points. Python should use those. It would be awful if Java and Python > code doing the same manipulations on the same string would come to > different conclusions because Python tried to paper over surrogates. I fully agree. >> Would such a less severe violation of the strict O(1) rule still be "not >> ok"? I think this is not such a clear black-and-white issue. Both >> implementations have notably different performance characteristics than >> CPython in some more or less important areas, as does PyPy. At some point, >> the language compliance label has to account for that. > > Since you had to ask, I have to declare that, indeed, non-O(1) > behavior would not be okay for those platforms. I take it that you say that because you want strings to perform in the 'normal' platform specific way here (i.e. like Java/.NET strings), and not so much because you want to require the exact same (performance) characteristics across Python implementations. So your choice is platform integration over code points, leaving the identical performance as a side-effect of the platform integration. > All in all, I don't think we should legislate Python strings to be > able to support 21-bit code points using O(1) indexing. PEP 393 makes > this possible for CPython, and it's been said that PyPy can follow > suit. But it'll be a "quality-of-implementation" issue, not built into > the language spec. Makes sense to me. Most likely, Unicode heavy Python code will have to take platform specifics into account anyway, so there are limits as to what is suitable for a language spec. Stefan
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4