Tim Peters wrote: > > [MAL] > > ...demo script... > > It looks like > > r'\\u0000' > > will get translated into a 2-character Unicode string. Right... > That's probably not > good, if for no other reason than that Java would not do this (it would > create the obvious 7-character Unicode string), and having something that > looks like a Java escape that doesn't *work* like the Java escape will be > confusing as heck for JPython users. Keeping track of even-vs-odd number of > backslashes can't be done with a regexp search, but is easy if the code is > simple <wink>: > ...Tim's version of the demo... Guido and I have decided to turn \uXXXX into a standard escape sequence with no further magic applied. \uXXXX will only be expanded in u"" strings. Here's the new scheme: With the 'unicode-escape' encoding being defined as: · all non-escape characters represent themselves as a Unicode ordinal (e.g. 'a' -> U+0061). · all existing defined Python escape sequences are interpreted as Unicode ordinals; note that \xXXXX can represent all Unicode ordinals, and \OOO (octal) can represent Unicode ordinals up to U+01FF. · a new escape sequence, \uXXXX, represents U+XXXX; it is a syntax error to have fewer than 4 digits after \u. Examples: u'abc' -> U+0061 U+0062 U+0063 u'\u1234' -> U+1234 u'abc\u1234\n' -> U+0061 U+0062 U+0063 U+1234 U+05c Now how should we define ur"abc\u1234\n" ... ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 44 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4