A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2000-May/003846.html below:

[I18n-sig] Re: [Python-Dev] Unicode debate

[I18n-sig] Re: [Python-Dev] Unicode debateM.-A. Lemburg mal@lemburg.com
Tue, 02 May 2000 11:56:21 +0200
Fredrik Lundh wrote:
> 
> M.-A. Lemburg <mal@lemburg.com> wrote:
> > Just a small note on the subject of a character being atomic
> > which seems to have been forgotten by the discussing parties:
> >
> > Unicode itself can be understood as multi-word character
> > encoding, just like UTF-8. The reason is that Unicode entities
> > can be combined to produce single display characters (e.g.
> > u"e"+u"\u0301" will print "é" in a Unicode aware renderer).
> > Slicing such a combined Unicode string will have the same
> > effect as slicing UTF-8 data.
> 
> really?  does it result in a decoder error?  or does it just result
> in a rendering error, just as if you slice off any trailing character
> without looking...

In the example, if you cut off the u"\u0301", the "e" would
appear without the acute accent, cutting off the u"e" would
probably result in a rendering error or worse put the accent
over the next character to the left.

UTF-8 is better in this respect: it warns you about
the error by raising an exception when being converted to
Unicode.
 
> > It seems that most Latin-1 proponents seem to have single
> > display characters in mind. While the same is true for
> > many Unicode entities, there are quite a few cases of
> > combining characters in Unicode 3.0 and the Unicode
> > normalization algorithm uses these as basis for its
> > work.
> 
> do we supported automatic normalization in 1.6?

No, but it is likely to appear in 1.7... not sure about
the "automatic" though.

FYI: Normalization is needed to make comparing Unicode
strings robust, e.g. u"é" should compare equal to u"e\u0301".

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/





RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4