Jack Jansen <Jack.Jansen@oratrix.com> writes: > If I understand the unicode standard (according to unicode.org) > correctly this means that MacOS stores filenames in NFD normalized > form, with all combining characters split out, and this is the > preferred normalized form. Am I correct here? You are correct that this is likely the form that OS X uses on-disk, and at the APIs. This is not really the preferred form - W3C favours and advocates NFC - precisely because it is easier to transform into legacy encodings (as you just observed). > But, even if NFC is the preferred normalized form (the documents I saw > hinted that this may have been the case in previous Unicode > standards:-): both NFC and NFD renditions of this string are legal > unicode, aren't they? And if they are then both should be converted to > the same latin-1 string, shouldn't they? Yes, and yes. > Do I misunderstand something, or this this a bug (limitation?) in the > unicode->latin-1 decoder? It's a limitation, in all codecs. Contributions of normalization code are welcome. Since this is hard work, this is unlikely to be fixed in Python 2.3 - unless somebody has a really good incentive for fixing it. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4