Andy Robinson wrote: > > > See my other post on the subject... > > > > Note that if we make UTF-8 the standard encoding, > > nearly all > > special Latin-1 characters will produce UTF-8 errors > > on input > > and unreadable garbage on output. That will probably > > be unacceptable > > in Europe. To remedy this, one would *always* have > > to use > > u.encode('latin-1') to get readable output for > > Latin-1 strings > > repesented in Unicode. > > You beat me to it - a colleague and I were just > discussing this verbally. Specifically we Brits will > get annoyed as soon as we read in a text file with > pound (sterling) signs. > > We concluded that the only reasonable default (if you > have one at all) is pure ASCII. At least that way I > will get a clear and intelligible warning when I load > in such a file, and will remember to specify > ISO-Latin-1. Well, Guido's post made me rethink the approach... 1. Setting <default encoding> to any non UTF encoding will result in data lossage due to the encoding limits imposed by the other formats -- this is dangerous and will result in errors (some of which may not even be noticed due to the interpreter ignoring them) in case your strings use non encodable characters. 2. You basically only want to set <default encoding> to anything other than UTF-8 for stream input and output. This can be done using the unicodec stream wrapper without too much inconvenience. (We'll have to extend the wrapper a little, though, because it currently only accept Unicode objects for writing and always return Unicode object when reading.) 3. We should leave the issue open until some code is there to be tested... I have a feeling that there will be quite a few strange effects when APIs expecting strings are fed with Unicode objects returning UTF-8. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4