Paul Prescod wrote: > > Combining characters are a whole 'nother level of complexity. Charater > sets are hard. I don't accept that the argument that "Unicode itself has > complexities so that gives us license to introduce even more > complexities at the character representation level." > > > FYI: Normalization is needed to make comparing Unicode > > strings robust, e.g. u"é" should compare equal to u"e\u0301". > > That's a whole 'nother debate at a whole 'nother level of abstraction. I > think we need to get the bytes/characters level right and then we can > worry about display-equivalent characters (or leave that to the Python > programmer to figure out...). I just wanted to point out that the argument "slicing doesn't work with UTF-8" is moot. I do see a point against UTF-8 auto-conversion given the example that Guido mailed me: """ s = 'ab\341\210\264def' # == str(u"ab\u1234def") s.find(u"def") This prints 3 -- the wrong result since "def" is found at s[5:8], not at s[3:6]. """ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4