[MAL] > Guido and I have decided to turn \uXXXX into a standard > escape sequence with no further magic applied. \uXXXX will > only be expanded in u"" strings. Does that exclude ur"" strings? Not arguing either way, just don't know what all this means. > Here's the new scheme: > > With the 'unicode-escape' encoding being defined as: > > · all non-escape characters represent themselves as a Unicode ordinal > (e.g. 'a' -> U+0061). Same as before (scream if that's wrong). > · all existing defined Python escape sequences are interpreted as > Unicode ordinals; Same as before (ditto). > note that \xXXXX can represent all Unicode ordinals, This means that the definition of \xXXXX has changed, then -- as you pointed out just yesterday <wink>, \xABCDq currently acts like \xCDq. Does the new \x definition apply only in u"" strings, or in "" strings too? What is the new \x definition? > and \OOO (octal) can represent Unicode ordinals up to U+01FF. Same as before (ditto). > · a new escape sequence, \uXXXX, represents U+XXXX; it is a syntax > error to have fewer than 4 digits after \u. Same as before (ditto). IOW, I don't see anything that's changed other than an unspecified new treatment of \x escapes, and possibly that ur"" strings don't expand \u escapes. > Examples: > > u'abc' -> U+0061 U+0062 U+0063 > u'\u1234' -> U+1234 > u'abc\u1234\n' -> U+0061 U+0062 U+0063 U+1234 U+05c The last example is damaged (U+05c isn't legit). Other than that, these look the same as before. > Now how should we define ur"abc\u1234\n" ... ? If strings carried an encoding tag with them, the obvious answer is that this acts exactly like r"abc\u1234\n" acts today except gets a "unicode-escaped" encoding tag instead of a "[whatever the default is today]" encoding tag. If strings don't carry an encoding tag with them, you're in a bit of a pickle: you'll have to convert it to a regular string or a Unicode string, but in either case have no way to communicate that it may need further processing; i.e., no way to distinguish it from a regular or Unicode string produced by any other mechanism. The code I posted yesterday remains my best answer to that unpleasant puzzle (i.e., produce a Unicode string, fiddling with backslashes just enough to get the \u escapes expanded, in the same way Java's (conceptual) preprocessor does it).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4