Guido van Rossum <guido@python.org> writes: > > It could be that Apple is decomposing the filenames before comparing > > them. Either way works. > > Hm, that sucks (either way) -- because you get unnormalized Unicode > out of directory listings, which is harder to turn into local > encodings. Notice that, most likely, Apple *does* normalize them - they just use Normal Form D (which favours decomposition, instead of using precomposed characters) - this is what Apple apparently calls "canonical". That choice is not surprising - NFD is "more logical", as precomposed characters are available only arbitrarily (e.g. the WITH TILDE combinations exist for a, i, e, n, o, u, v, y, but not for, say, x). The Unicode FAQ (http://www.unicode.org/unicode/faq/normalization.html) says Q: Which forms of normalization should I support? A: The choice of which to use depends on the particular program or system. The most commonly supported form is NFC, since it is more compatible with strings converted from legacy encodings. This is also the choice for the web, as per the recommendations in "Character Model for the World Wide Web" from the W3C. The other normalization forms are useful for other domains. So I guess Python should atleast provide NFC - precisely because of the legacy encodings. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4