Guido van Rossum <guido@python.org> writes: > But the treatment of k under phase 2 will be, um, interesting, and I'm > not sure what it should do!!! Since in phase 2 the entire file will > be decoded from KOI8-R to Unicode before it's parsed, maybe the best > thing would be to encode 8-bit string literals back using KOI8-R (in > general, the encoding given in the encoding cookie). The meaning of the string literals will not change: they continue to denote byte strings, and they continue to denote the same byte strings that they denote today (by accident). What will change is this: - it will be official that you can put KOI-8R into a string literal, and that the interpreter will produce the byte string "as-is" - it will be an error if the byte string does not follow the encoding, e.g. if you declare UTF-8, but have some string literal that violates the UTF-8 structure - Python will determine token boundaries only after decoding the input, so a byte value of 34 does not necessarily indicate the end of a string anymore (if the decoder consumes the byte as the second byte of some character) In general, the implementation strategy will be indeed that strings literals are encoded back into their original encoding. It is not clear to me when this should happen, though; in particular, whether the AST should have Py_UNICODE* everywhere. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4