Just van Rossum: > Exactly. By saying "(wide) strings are not tied to Unicode" the question > whether wide strings should or should not be sorted according to the > Unicode spec is answered by a simple "no", instead of "hmm, maybe, but it's > too hard anyway"... I personally like the idea speaking of "wide strings" containing wide character codes instead of Unicode objects. Unfortunately there are many methods which need to interpret the content of strings according to some encoding knowledge: for example 'upper()', 'lower()', 'swapcase()', 'lstrip()' and so on need to know, to which class certain characters belong. This problem was already some kind of visible in 1.5.2, since these methods were available as library functions from the string module and they did work with a global state maintained by the 'setlocale()' C-library function. Quoting from the C library man pages: """ The details of what constitutes an uppercase or lowercase letter depend on the current locale. For example, the default "C" locale does not know about umlauts, so no con version is done for them. In some non - English locales, there are lowercase letters with no corresponding uppercase equivalent; the German sharp s is one example. """ I guess applying 'upper' to a chinese char will not make much sense. Now these former string module functions were moved into the Python object core. So the current Python string and Unicode object API is somewhat "western centric". ;-) At least Marc's implementation in 'unicodectype.c' contains the hard coded assumption, that wide strings contain really unicode characters. print u"äöü".upper().encode("latin1") shows "ÄÖÜ" independent from the locale setting. This makes sense. The output from print u"äöü".upper().encode() however looks ugly here on my screen... UTF-8 ... blech:Ã ÃÃ Regards and have a nice weekend, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4