Martin v. Loewis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >>Of course, we no longer need to convert the tokenizer to >>work on Py_UNICODE, so the updated text should mention >>that compile() encodes Unicode input to UTF-8 to the continue >>with the usual processing. > > > The PEP currently does not say that. I know, it should be updated to the solution found by Hisao. >>>2. convert to byte string using "utf-8" encoding, >> > [...] > >>Option 2. > > > I think this contradicts the current wording of the PEP. It says > > "5. ... and creating string objects from the Unicode literal data by > first reencoding the UTF-8 data into 8-bit string data using the given > file encoding" > > The phrasing "the given file encoding" is a bit lax, but given the > string > > u""" > # -*- coding: iso-8859-1 -*- > s = 'some latin-1 text' > """ > > I would expect that the encoding "given" is iso-8859-1, not utf-8. > Now, I interpret your message to mean that s will be encoded in > utf-8. Correct? Hmm, good point. 8-bit string literals will have to be reencoded using the encoding stated in the coding comment... skipping that comment for Unicode argument to compile() would break this. > If so, I think Fredrik is right, and > > compile(unicode(script, extract_encoding(script))) > > does indeed something different than > > compile(script) > > as the latter would give the string value assigned to s in its > original encoding, i.e. latin-1. Right. We don't want that. compile(unicode(script, extract_encoding(script))) should be the same as compile(script) >>Ideal would be to have the tokenizer skip the encoding declaration >>detection and start directly with the UTF-8 string > > > "skip the encoding declaration" can't really work; you have to parse > the source code line by line. You can tell the implementation to > ignore the encoding declaration, if desired. No, this wouldn't be right. I withdraw that comment :-) >>(this also solves the problems you'd run into in case the Unicode >>source code has a source code encoding comment). > > > Well, that is precisely the issue that I'm trying to address here. I > still believe that the resulting behaviour is not specified in the PEP > at the moment (which is no big deal, since the current implementation > does not touch compile() at all). I'll try to come up with a proper wording tomorrow. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4