>From Andrew's new pass: """ Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UTF-16, as 16-bit unsigned integers. """ Please replace UTF-16 with UCS-2. Python's Unicode implementation does not support UTF-16 in a surrogate aware way, only some of the codecs do this. As a result, the internal storage format of Python is more precisely described with UCS-2. """ Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by supplying \longprogramopt{enable-unicode=ucs4} to the configure script. When built to use UCS-4 (a ``wide Python''), the interpreter can natively handle Unicode characters from U+000000 to U+110000. The range of legal values for the \function{unichr()} function has been expanded; it used to only accept values up to 65535, but in 2.2 will accept values from 0 to 0x110000. Using a ``narrow Python'', an interpreter compiled to use UTF-16, values greater than 65535 will result in \function{unichr()} returning a string of length 2: \begin{verbatim} >>> s = unichr(65536) >>> s u'\ud800\udc00' >>> len(s) 2 \end{verbatim} """ Same here: UTF-16 -> UCS-2. Note that I very much favour removing the surrogate generation in unichr() for UCS2-builds. If I don't here strong opposition, I'll disable this feature which was added as part of the UCS-4 patches. unichr() will then raise an exception as it did in version 2.1. """ This possibly-confusing behaviour, breaking the intuitive invariant that \function{chr()} and\function{unichr()} always return strings of length 1, may be changed later in 2.2, depending on public reaction. """ Right. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4