Fredrik Lundh wrote: > > as tim pointed out in an earlier thread (on SRE), the > \xnn escape code is something of a kludge. > > I just noted that the unicode string type supports \x > as well as \u, with slightly different semantics: > > \u -- exactly four hexadecimal characters are read. > > \x -- 1 or more hexadecimal characters are read, and > the result is casted to a Py_UNICODE character. \x is there in Unicode for compatibility with the 8-bit string implementation and in sync with ANSI C. Guido wanted these semantics when I asked him about it during the implementation phase. > I'm pretty sure this is an optimal design, but I'm not sure > how it should be changed: > > 1. treat \x as a hexadecimal byte, not a hexadecimal > character. or in other words, make sure that > > ord("\xabcd") == ord(u"\xabcd") > > fwiw, this is how it's done in SRE's parser (see the > python-dev archives for more background). > > 2. ignore \x. after all, \u is much cleaner. > > u"\xabcd" == "\\xabcd" > u"\u0061" == "\x61" == "\x0061" == "\x00000061" > > 3. treat \x as an encoding error. > > 4. read no more than 4 characters. (a comment in the > code says that \x reads 0-4 characters, but the code > doesn't match that comment) > > u"\x0061bcd" == "abcd" > > 5. leave it as it is (just fix the comment). I'd suggest 5 -- makes converting 8-bit strings using \x to Unicode a tad easier. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4