A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-May/089467.html below:

[Python-Dev] PEP 383 update: utf8b is now the error handler

[Python-Dev] PEP 383 update: utf8b is now the error handler"Martin v. Löwis" martin at v.loewis.de
Thu May 7 01:16:18 CEST 2009
> I qualify with a). I believe I understand c) but, as explained in my
> other post, I do not think your reason applies.  In fact, I think
> concern for naming rights might suggest that you *not* reuse the name
> for something different.  I would have to learn more about the existing
> 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'.
> 'Surrogates-escape' is pretty good for the new handler since, to my
> understanding, it 'escapes' 'bad bytes' by prefixing them with bits that
> push them to the surrogates plane.

See issue 3672. In essence, in python 2.5:

py> u"\ud800".encode("utf-8")
'\xed\xa0\x80'
py> '\xed\xa0\x80'.decode("utf-8")
u'\ud800'

In 3.1,

py> "\ud800".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed
py> "\ud800".encode("utf-8","surrogates")
b'\xed\xa0\x80'
py> b'\xed\xa0\x80'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
illegal encoding
py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
'\ud800'

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4