"Fredrik Lundh" <fredrik@pythonware.com> writes: > hmm. I'm tempted to think that there's a major > flaw in the PEP, caused by the fact that > > compile(unicode(script, extract_encoding(script))) > > will, from what I can tell, not compile to the same > thing as: > > compile(script) Can you elaborate what you think the difference is? I believe the PEP is silent on this specific aspect, but I think what should happen is (in the Unicode case): - compile will convert the script to UTF-8, which is then tokenized. - in the process of parsing, the encoding declaration (that presumably extract_encoding was looking at as well) is recognized, if any. - Unicode literals are left as-is; byte string literals are converted back to the original encoding. So if there is an encoding declaration in script, then I cannot see a difference. If there is none, the PEP does not elaborate what should happen. Leaving the byte strings as UTF-8 seems safest, since the only way to get "correct" non-ASCII strings without the encoding comment is to use the UTF-8 signature. In any case, this can't cause backwards compatibility problems. compile accepts Unicode strings today only if they can be converted to a byte string. In the standard installation, this will fail today if there is non-ASCII in script. So allowing Unicode in compile is a pure extension. If its precise meaning is underspecified, it should be clarified before stage 2 is implemented. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4