RetroSearch Browse

Fri May 1 22:21:36 CEST 2009 · https://mail.python.org/pipermail/python-dev/2009-May/089337.html

Zooko O'Whielacronx wrote:
> Following-up to my own post to correct a major error:

> Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ?  That is my Requirement

If you start with bytes, decode with utf-8b to unicode (possibly 
'invalid'), and encode the result back to bytes with utf-8b, you should 
get the original bytes, regardless of what they were.  That is the point 
of PEP 383 -- to reliably roundtrip file 'names' that start as bytes and 
must end as the same bytes but which may not otherwise have a unicode 
decoding.

If you start with invalid unicode text, encode to bytes with utf-8b, and 
decode back to unicode, you might instead get a different and valid 
unicode text.  An example was given in the discussion.  I believe this 
would be hard to avoid.  An any case, it does not matter for the use 
case of starting with bytes that one wants to temporarily but surely 
work with as text.

Terry Jan Reedy

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2009-May/089337.html below:

[Python-Dev] PEP 383 and GUI libraries