A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-August/027788.html below:

[Python-Dev] PEP 277 (unicode filenames): please review

[Python-Dev] PEP 277 (unicode filenames): please reviewMartin v. Loewis martin@v.loewis.de
14 Aug 2002 08:33:13 +0200
Jack Jansen <Jack.Jansen@oratrix.com> writes:

> If I understand the unicode standard (according to unicode.org)
> correctly this means that MacOS stores filenames in NFD normalized
> form, with all combining characters split out, and this is the
> preferred normalized form. Am I correct here?

You are correct that this is likely the form that OS X uses on-disk,
and at the APIs. This is not really the preferred form - W3C favours
and advocates NFC - precisely because it is easier to transform into
legacy encodings (as you just observed).

> But, even if NFC is the preferred normalized form (the documents I saw
> hinted that this may have been the case in previous Unicode
> standards:-): both NFC and NFD renditions of this string are legal
> unicode, aren't they? And if they are then both should be converted to
> the same latin-1 string, shouldn't they?

Yes, and yes.

> Do I misunderstand something, or this this a bug (limitation?) in the
> unicode->latin-1 decoder?

It's a limitation, in all codecs. Contributions of normalization code
are welcome. Since this is hard work, this is unlikely to be fixed in
Python 2.3 - unless somebody has a really good incentive for fixing
it.

Regards,
Martin



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4