On Mon, Dec 29, 2003 at 06:24:57PM +0100, Martin v. Loewis wrote: > Looking at python.org/sf/866982, I find it troubling that > there are languages where "I".lower() != "i" > (for those of you not familiar with Turkish: the lower-case > letter of "I" is U+0131, LATIN SMALL LETTER DOTLESS I, > which is \xfd in iso-8859-9). This post caused me to notice the following behavior. Is it "right"? >>> import locale >>> locale.setlocale(locale.LC_CTYPE, "tr_TR") 'tr_TR' >>> locale.getlocale()[1] # Expected charset 'ISO8859-9' >>> "I".lower() # Expected behavior '\xfd' >>> u"I".lower() # Python bug? (should be u'\u0131') u'i' >>> locale.setlocale(locale.LC_CTYPE, "tr_TR.UTF-8") 'tr_TR.UTF-8' >>> "I".lower() # C library bug? (should be "\xc4\xb1")* 'I' >>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8") 'en_US.UTF-8' >>> "I".lower() # (UTF-8 locale works properly in english) 'i' Jeff * RedHat 9, glibc-2.3.2-11.9
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4