RetroSearch Browse

Mon Dec 29 12:40:08 EST 2003 · https://mail.python.org/pipermail/python-dev/2003-December/041468.html

On Mon, Dec 29, 2003 at 06:24:57PM +0100, Martin v. Loewis wrote:
> Looking at python.org/sf/866982, I find it troubling that
> there are languages where "I".lower() != "i"
> (for those of you not familiar with Turkish: the lower-case
> letter of "I" is U+0131, LATIN SMALL LETTER DOTLESS I,
> which is \xfd in iso-8859-9).

This post caused me to notice the following behavior.  Is it "right"?

>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, "tr_TR")
'tr_TR'
>>> locale.getlocale()[1]  # Expected charset
'ISO8859-9'
>>> "I".lower()   # Expected behavior
'\xfd'
>>> u"I".lower()  # Python bug? (should be u'\u0131')
u'i'
>>> locale.setlocale(locale.LC_CTYPE, "tr_TR.UTF-8")
'tr_TR.UTF-8'
>>> "I".lower()   # C library bug? (should be "\xc4\xb1")*
'I'
>>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8")
'en_US.UTF-8'
>>> "I".lower()   # (UTF-8 locale works properly in english)
'i'

Jeff
* RedHat 9, glibc-2.3.2-11.9

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2003-December/041468.html below:

[Python-Dev] str.ascii_lower