Guido van Rossum wrote: > > > > Untrue: it supports range(0x110000) (in UCS-2 mode this returns a > > > surrogate pair). Now, maybe that's not what it *should* do... > > > > It should definitely not, unless you want to break code which assumes > > that chr() and unichr() always return a single byte/code unit ! > > Reasonable people can disagree about this. > > > This was part of the UCS-4 checkins which hadn't had time yet to > > review. Should I remove the surrogate part for narrow builds ? > > Well, this snuck into the 2.2a1, so hopefully we'll get some comments > ("love it" / "hate it") from the field to guide our decision. Waiting for comments from the field :-) > > > > and there's no \code{\e U} notation for embedding characters > > > > greater than 65535 in a Unicode string literal. > > > > > > Not true either -- correct \U has been part of Python since 2.0. It > > > does the same thing as unichr() described above. > > > > Right. > > > > Note that in this case, the handling of surrogates is needed > > to make the unicode-escape encoding roundtrip safe. > > I don't understand what this means. Can you give an example? It means that the roundtrip Unicode -> encoding -> Unicode is a 1-1 mapping for all Unicode code points. Other examples for roundtrip safe encodings are UTF-8 and UT-16. Looking at the code, I found that the unicode-escape encoder does not convert Unicode surrogates to \UXXXXXXXX escapes. I'll fix that. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4