Jeff Epler wrote: >>>>u"I".lower() # Python bug? (should be u'\u0131') > > u'i' As Guido says: unicode.tolower is locale-inaware; it uses the Unicode Consortium character properties instead to determine the lower-case character. >>>>"I".lower() # C library bug? (should be "\xc4\xb1")* > > 'I' This is really a limitation of the C language, not of the C library. The interface is char tolower(char input); so it can only accept and return a single char. Multi-byte characters are not supported in that interface. Traditionally, for characters that cannot be converted, tolower returns its argument. >>>>"I".lower() # (UTF-8 locale works properly in english) > > 'i' This is because "i" is a single byte in UTF-8. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4