Random832 writes: > "Stephen J. Turnbull" <stephen at xemacs.org> writes: > > I don't see any good reason for allowing non-ASCII-compatible > > encodings in the reference CPython interpreter. > > There might be a case for having the tokenizer not care about encodings > at all and just operate on a stream of unicode characters provided by a > different layer. That's exactly what the PEP 263 implementation does in Python 2 (with the caveat that Python 2 doesn't know anything about Unicode, it's a UTF-8 stream and the non-ASCII characters are treated as bytes of unknown semantics, so they can't be used in syntax). I don't know about Python 3, I haven't looked at the decoding of source programs. But I would assume it implements PEP 263 still, except that since str is now either widechars or PEP 393 encoding (ie, flexible widechars) that encoding is now used instead of UTF-8. I'm sure that there are plenty of ASCII-isms in the tokenizer in the sense that it assumes the ASCII *character* (not byte) repertoire. But I'm not sure why Serhiy thinks that the tokenizer cares about the representation on-disk. But as I say, I haven't looked at the code so he might be right. Steve
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4