On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > You've got that wrong: Python let's you choose UCS-4 - > UCS-2 is the default. > > Note that Python's Unicode codecs UTF-8 and UTF-16 > are surrogate aware and thus support non-BMP code points > regardless of the build type: A UCS2-build of Python will > store a non-BMP code point as UTF-16 surrogate pair in the > Py_UNICODE buffer while a UCS4 build will store it as a > single value. Decoding is surrogate aware too, so a UTF-16 > surrogate pair in a UCS2 build will get treated as single > Unicode code point. If this is the case, then we're clearly misleading users. If the configure script says UCS-2, then as a user I would assume that surrogate pairs would *not* be encoded, because I chose UCS-2, and it doesn't support that. I would assume that any UTF-16 string I would read would be transcoded into the internal type (UCS-2), and information would be lost. If this is not the case, then what does the configure option mean? -- Nick
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4