RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-May/003914.html below:

[I18n-sig] Re: [Python-Dev] Unicode debate

[I18n-sig] Re: [Python-Dev] Unicode debateTim Peters tim_one@email.msn.com
Wed, 3 May 2000 01:47:37 -0400

Previous message: [Python-Dev] buffer object (was: Unicode debate)
Next message: [Python-Dev] Unicode debate
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Moshe Zadka]
> ...
> I'd much prefer Python to reflect a fundamental truth about Unicode,
> which at least makes sure binary-goop can pass through Unicode and
> remain unharmed, then to reflect a nasty problem with UTF-8 (not
> everything is legal).

Then you don't want Unicode at all, Moshe.  All the official encoding
schemes for Unicode 3.0 suffer illegal byte sequences (for example, 0xffff
is illegal in UTF-16 (whether BE or LE); this isn't merely a matter of
Unicode not yet having assigned a character to this position, it's that the
standard explicitly makes this sequence illegal and guarantees it will
always be illegal!  the other place this comes up is with surrogates, where
what's legal depends on both parts of a character pair; and, again, the
illegalities here are guaranteed illegal for all time).  UCS-4 is the
closest thing to binary-transparent Unicode encodings get, but even there
the length of a thing is contrained to be a multiple of 4 bytes.  Unicode
and binary goop will never coexist peacefully.

Previous message: [Python-Dev] buffer object (was: Unicode debate)
Next message: [Python-Dev] Unicode debate
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4