I stand corrected about the behavior of Unicode in the presence of locales. On Mon, Dec 29, 2003 at 09:47:39AM -0800, Guido van Rossum wrote: > > >>> locale.setlocale(locale.LC_CTYPE, "tr_TR.UTF-8") > > 'tr_TR.UTF-8' > > >>> "I".lower() # C library bug? (should be "\xc4\xb1")* > > 'I' > > >>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8") > > 'en_US.UTF-8' > > >>> "I".lower() # (UTF-8 locale works properly in english) > > 'i' > > I have no idea what adding UTF8 to the local means. Is this something > that Python's locale-awareness does or is it simply recognized by the > C library? "A locale name is typically of the form language[_territory] [.code-set][@modifier]" -- man setlocale() on my system RedHat 9 made a halfhearted attempt to use UTF-8 as the encoding for all locales. So it sets LANG=en_US.UTF-8 by default. In theory, tr_TR.UTF_8 should be the Turkish locale with UTF-8 characters, but it behaves incorrectly by having "I".lower() == "I". Well, since my earlier post combined a misunderstanding of how Python works with a possible C library bug, I guess I raised two non-issues. Sorry for wasting everyone's time. Jeff
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4