A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2002-August/027786.html below:

[Python-Dev] PEP 277 (unicode filenames): please review

[Python-Dev] PEP 277 (unicode filenames): please reviewMartin v. Loewis martin@v.loewis.de
14 Aug 2002 08:23:42 +0200
Skip Montanaro <skip@pobox.com> writes:

> What's the current behavior?  If my program receives an input in utf-8
> (let's say it comes from a form on a website), what form will it be in, or
> can't I tell?  

In general, you cannot tell in advance - it will depend on the data
source.

W3C advocates "early normalization" towards "NFC", meaning that in the
Internet, you should always see NFC data - unless you are primary data
source, e.g. by reading from a terminal, or after decoding some legacy
encoding. It turns out that most Python codecs will produce NFC
already, so normalization to NFC would be required only for user input,
and - as it turns out - when reading file names on OS X.

> Is it possible I will get spurious inequalities today if I compare
> two different unicode objects which were created from different
> sources and in different normal forms?

If they are in different normal forms, you *will* get inequalities
reliably. In the real world, inequalities will be spurious.

> What about a string and a unicode object?  Where can I read all
> about it (Python and unicode normalization)?

Python does no normalization, so there is nothing to read. For
Unicode, you may want to start with the Normalization FAQ

http://www.unicode.org/unicode/faq/normalization.html

Regards,
Martin



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4