A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-April/089019.html below:

[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces"Martin v. Löwis" martin at v.loewis.de
Sat Apr 25 17:05:23 CEST 2009
> The only drawback I can see is if the UTF-8 bytes actually decode to a
> half surrogate. However, half surrogates should really only occur in
> UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8
> anyway!

Right: that's the rationale for UTF-8b. Encoding half surrogates
violates parts of the Unicode spec, so UTF-8b is "safe".

> As for handling this case, you could either:
> 
> 1. Raise an exception (which is what you're trying to avoid)
> 
> or:
> 
> 2. Treat it as invalid UTF-8 and map the bytes to half surrogates
> (encoding would produce the original bytes).
> 
> I'd prefer option 2.

I hadn't thought of this case, but you are right - they *are*
illegal bytes, after all. Raising an exception would be useless
since the whole point of this codec is to never raise unicode
errors.

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4