Guido van Rossum wrote: > > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! Noo... there is no definition for non-ASCII 8-bit strings in Python source code using the ordinal range 127-255. If you were to define Latin-1 as source code encoding, then we would have to change auto-coercion to make a Latin-1 assumption instead, but... I see the picture: people are getting pretty confused about what is going on. If you write u"xyz" then the ordinals of those characters are taken and stored directly as Unicode characters. If you live in a Latin-1 world, then you happen to be lucky: the Unicode characters match your input. If not, some totally different characters are likely to show if the string were written to a file and displayed using a Unicode aware editor. The same will happen to your normal 8-bit string literals. Nothing unusual so far... if you use Latin-1 strings and write them to a file, you get Latin-1. If you happen to program on DOS, you'll get the DOS ANSI encoding for the German umlauts. Now the key point where all this started was that u'ä' in 'äöü' will raise an error due to 'äöü' being *interpreted* as UTF-8 -- this doesn't mean that 'äöü' will be interpreted as UTF-8 elsewhere in your application. The UTF-8 assumption had to be made in order to get the two worlds to interoperate. We could have just as well chosen Latin-1, but then people currently using say a Russian encoding would get upset for the same reason. One way or another somebody is not going to like whatever we choose, I'm afraid... the simplest solution is to use Unicode for all strings which contain non-ASCII characters and then call .encode() as necessary. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4