RetroSearch Browse

Thu May 7 04:35:52 CEST 2009 · https://mail.python.org/pipermail/python-dev/2009-May/089474.html

"Martin v. Löwis" writes:

 > > Now, with Python's file system encoding == UTF-8 or any packed EUC,
 > > and more than a handful of Shift JIS or Big5 characters in file names,
 > > one is *almost certain* to encounter ASCII as the second byte of a
 > > multibyte sequence.  PEP 383 can't handle this

Ah, I see.  Of course, the algorithm not only has to handle the ASCII
octet which is erroneous because it can't be a trailing byte, but
*also the leading byte that signalled to expect a trailing byte >127*.
So the algorithm backs up to the character boundary (which is
well-defined for all the "sane" encodings), encode the high byte(s) in
the character with lone surrogates, and encode the ASCII as itself
(promoted to a Unicode code point).

Sorry, you're right, I was just confused.  I withdraw the objection as
completely mistaken, and apologize for not thinking more carefully in
the first place.

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2009-May/089474.html below:

[Python-Dev] PEP 383 update: utf8b is now the error handler