MAL wrote: > Please note that unichr() is a low-level API which is part > of the Unicode implementation. well, I thought unichr() was a built-in Python function... > To simplify the picture: the implementation itself only sees > UCS-2 or UCS-4 depending on the compile time option and these > do not treat surrogates in any special way except reserve > code points for their usage. Accordingly, unichr() should not > create UTF-16 but UCS-2 for narrow builds and UCS-4 on wide > builds you didn't answer my question: is there any reason why unichr(0xXXXXXXXX) shouldn't return exactly the same thing as "\UXXXXXXXX" ? in 2.0 and 2.1, it doesn't. in 2.2, it does. > (unichr() is a contructor for code units, not code points). really? according to the documentation, it creates unicode *characters*. so does \U, according to the documentation. imo, it makes more sense to let "characters" mean code points than code units, but that's me. the important thing here is to figure out if \U and unichr are the same thing, and fix the code and the documentation to do/say what we mean. </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4