A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-May/089420.html below:

[Python-Dev] PEP 383 update: utf8b is now the error handler

[Python-Dev] PEP 383 update: utf8b is now the error handler [Python-Dev] PEP 383 update: utf8b is now the error handler"Martin v. Löwis" martin at v.loewis.de
Tue May 5 22:46:26 CEST 2009
>  > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
>  > > it's an algorithm based on 16-bit or 32-bit code points.
> 
> I don't understand this phrasing.  The algorithm is only applicable to
> ASCII-compatible octet streams.  It results in code points by a simple
> displacement of octet -> octet + 0xDC00.  It cannot be used on (say)
> UTF-32 to deal with embedded surrogates.
> 
> Certainly, the computation requires (at least) 16 bit numbers, but the
> input must be restricted to a stream of 8-bit code points, while the
> output is 16- or 32-bit code points.

Right - the algorithm maps between bytes and 16/32-bit code units.
It works, in particular, for UTF-8, and was originally proposed to apply
to UTF-8 - but it can work in any other place that converts bytes to
16/32-bit code units as well.

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4