M.-A. Lemburg writes: > The details are on the www.unicode.org web-site burried > in some of the tech reports on normalization and > collation. This is described in the Unicode standard itself, and in UTR #15 and UTR #10. Normalization is an issue with wider imlications than just handling glyph variants: indeed, it's irrelevant. The question is this: should U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS compare equal to U+0055 LATIN CAPITAL LETTER U U+0308 COMBINING DIAERESIS or not? It depends on the application. Certainly in a database system I would want these to compare equal. Perhaps normalization form needs to be an option of the string comparator? -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4