A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2014-August/136048.html below:

[Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]

[Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...] [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]Greg Ewing greg.ewing at canterbury.ac.nz
Sat Aug 30 01:37:18 CEST 2014
M.-A. Lemburg wrote:
> we needed
> a way to make sure that Python 3 also optionally supports working
> with lone surrogates in such UTF-8 streams (nowadays called CESU-8:
> http://en.wikipedia.org/wiki/CESU-8).

I don't think CESU-8 is the same thing. According to the wiki
page, CESU-8 *requires* all code points above 0xffff to be split
into surrogate pairs before encoding. It also doesn't say that
lone surrogates are valid -- it doesn't mention lone surrogates
at all, only pairs. Neither does the linked technical report.

The technical report also says that CESU-8 forbids any UTF-8
sequences of more than three bytes, so it's definitely not
"UTF-8 plus lone surrogates".

-- 
Greg
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4