as tim pointed out in an earlier thread (on SRE), the \xnn escape code is something of a kludge. I just noted that the unicode string type supports \x as well as \u, with slightly different semantics: \u -- exactly four hexadecimal characters are read. \x -- 1 or more hexadecimal characters are read, and the result is casted to a Py_UNICODE character. I'm pretty sure this is an optimal design, but I'm not sure how it should be changed: 1. treat \x as a hexadecimal byte, not a hexadecimal character. or in other words, make sure that ord("\xabcd") =3D=3D ord(u"\xabcd") fwiw, this is how it's done in SRE's parser (see the python-dev archives for more background). 2. ignore \x. after all, \u is much cleaner. u"\xabcd" =3D=3D "\\xabcd" u"\u0061" =3D=3D "\x61" =3D=3D "\x0061" =3D=3D "\x00000061" 3. treat \x as an encoding error. 4. read no more than 4 characters. (a comment in the code says that \x reads 0-4 characters, but the code doesn't match that comment) u"\x0061bcd" =3D=3D "abcd" 5. leave it as it is (just fix the comment). comments? </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4