Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > Paul suggested adding encoding directives for 8-bit > > strings and comments, but these cannot be used by the Python > > compiler in any way and would only be for the benefit of an > > editor, so I don't really see the need for them. > > Sorry I wasn't clear. Like \F, I think that the best model is that of > XML, Java and (I've learned recently) Perl. There should be a single > encoding for the file. Logically speaking it should be decoded before > tokenization or parsing. Practically speaking it may be simpler to fake > this logical decoding in the implementation. I don't care how it is > implemented. Logically the model should be that any encoding declaration > affects the interpretation of the *file* not some particular construct > in the file. > > If this is too difficult to implement today then maybe we should wait on > the whole feature until someone has time to do it right. Hmm, I guess you have something like this in mind... 1. read the file 2. decode it into Unicode assuming some fixed per-file encoding 3. tokenize the Unicode content 4. compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the Unicode data into 8-bit string data To make this backwards compatible, the implementation would have to assume Latin-1 as the original file encoding if not given (otherwise, binary data currently stored in 8-bit strings wouldn't make the roundtrip). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4