Tim Peters wrote: > > [/F] > > would it be a good idea to add \UXXXXXXXX > > (8 hex digits) to 2.0? > > > > (only characters in the 0000-ffff range would > > be accepted in the current version, of course). > > [Tim agreed two msgs later; Guido agreed in private] > > [MAL] > > I don't really get the point of adding \uXXXXXXXX > > No: Fredrik's suggestion is with an uppercase U. He is not proposing to > extend the (lowercase) \u1234 notation. Ah, ok. So there will be no incompatibility with Java et al. > > when the internal format used is UTF-16 with support for surrogates. > > > > What should \u12341234 map to in a future implementation ? > > Two Python (UTF-16) Unicode characters ? > > \U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped > to UTF-16. Sure it can: you'd have to use surrogates. Whether this should happen is another question, but not one which we'll have to deal with now, since as Fredrik proposed, \UXXXXXXXX will only work for 0-FFFF and raise an exception for all higher values. > > See > > > > http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc > .html#100850 > > > > for how Java defines \uXXXX... > > Which I pushed for from the start, and nobody is seeking to change. > > > We're following an industry standard here ;-) > > \U12345678 is also an industry standard, but in a more recent language (than > Java) that had more time to consider the eventual implications of Unicode's > limitations. We reserve the notation now so that it's possible to outgrow > Unicode later. Ok. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4