Combining characters are a whole 'nother level of complexity. Charater sets are hard. I don't accept that the argument that "Unicode itself has complexities so that gives us license to introduce even more complexities at the character representation level." > FYI: Normalization is needed to make comparing Unicode > strings robust, e.g. u"é" should compare equal to u"e\u0301". That's a whole 'nother debate at a whole 'nother level of abstraction. I think we need to get the bytes/characters level right and then we can worry about display-equivalent characters (or leave that to the Python programmer to figure out...). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4