RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2002-May/024645.html below:

[Python-Dev] Re: String module

[Python-Dev] Re: String moduleMartin v. Loewis martin@v.loewis.de
30 May 2002 08:43:48 +0200

Previous message: [Python-Dev] Re: String module
Next message: [Python-Dev] Re: String module
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Guido van Rossum <guido@python.org> writes:

> > This reminds me that I often miss, in the standard `ctype.h' and related,
> > a function that would un-combine a character into its base character and
> > its diacritic, and the complementary re-combining function.
[...]
> I bet the Unicode standard has a standard way to do this.  

This is called 'unicode normalization forms'. Each "pre-combined"
character can also be represented as a base character, and a
"combining diacritic". There are symmetric normalization forms: NFC
favours pre-combined characters, NFD favours combining characters.

There is also a "compatibility decomposition" (K), where e.g. ANGSTROM
SIGN decomposes to LATIN CAPITAL LETTER A WITH RING ABOVE.

> Maybe we can implement that, and then project the same interface on
> 8-bit characters?

Not really. Needing to know the character set is one issue; the other
issue is that the stand-alone diacritic characters in ASCII are *not*
combining. We could certainly provide a mapping between the Unicode
combining diacritics and the stand-alone diacritics, say as a codec,
but that would be quite special-purpose.

Providing a good normalization library is necessary, though, since
many other algorithms (both from W3C and IETF) require Unicode
normalization as part of the processing (usually to NFKC).

Regards,
Martin

Previous message: [Python-Dev] Re: String module
Next message: [Python-Dev] Re: String module
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4