Guido van Rossum wrote: > > > Hmm, I guess you have something like this in mind... > > > > 1. read the file > > 2. decode it into Unicode assuming some fixed per-file encoding > > 3. tokenize the Unicode content > > 4. compile it, creating Unicode objects from the given Unicode data > > and creating string objects from the Unicode literal data > > by first reencoding the Unicode data into 8-bit string data > > > > To make this backwards compatible, the implementation would have to > > assume Latin-1 as the original file encoding if not given (otherwise, > > binary data currently stored in 8-bit strings wouldn't make the > > roundtrip). > > To be compatible with the current default encoding, I would use ASCII > as the default encoding and issue an error if any non-ASCII characters > are found. One should always use hex/oct escapes to enter binary data > in literals! Hmm, Latin-1 and other locale-specific encodings are currently being used in 8-bit strings by far too many people in Europe and elsewhere... people won't feel good about it. Note that the reason for using Latin-1 is that Latin-1 decoded into Unicode and then reencoded into Latin-1 is a 1-1 mapping for all 8-bit values -- this gives us binary backward compatibility. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4