Fredrik Lundh wrote: > > mal wrote: > > Same here: UTF-16 -> UCS-2. Note that I very much favour > > removing the surrogate generation in unichr() for UCS2-builds. > > > > If I don't here strong opposition, I'll disable this feature > > which was added as part of the UCS-4 patches. unichr() > > will then raise an exception as it did in version 2.1. > > the rationale behind this change was that unichr() should > behave like the \U escape. Please note that unichr() is a low-level API which is part of the Unicode implementation. The implementation itself does not handle surrogates in any special way, only the codecs do (and after my last checkin unicode-escape and UTF-16 do handle surrogates correctly). To simplify the picture: the implementation itself only sees UCS-2 or UCS-4 depending on the compile time option and these do not treat surrogates in any special way except reserve code points for their usage. Accordingly, unichr() should not create UTF-16 but UCS-2 for narrow builds and UCS-4 on wide builds (unichr() is a contructor for code units, not code points). If an application needs an UTF-16 generating API, then it can easily implement one using the UCS-2 generating unichr() API to create Unicode code units representing isolated surrogates. > (they both take a 32-bit character code, and turn it into > a unicode string; see GvR's mails in the ucs4 thread for more > on this topic). > > don't change one of them without considering if the other > one really does the right thing. <plug> For those of you who are not too much into all these code unit vs. code point vs. character discussions, a look at the slides of the talk I gave at the European Python Meeting in Bordeaux may provide some insights: http://www.lemburg.com/python/Unicode-Talk.pdf </plug> Cheers, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4