A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-May/089474.html below:

[Python-Dev] PEP 383 update: utf8b is now the error handler

[Python-Dev] PEP 383 update: utf8b is now the error handlerStephen J. Turnbull stephen at xemacs.org
Thu May 7 04:35:52 CEST 2009
"Martin v. Löwis" writes:

 > > Now, with Python's file system encoding == UTF-8 or any packed EUC,
 > > and more than a handful of Shift JIS or Big5 characters in file names,
 > > one is *almost certain* to encounter ASCII as the second byte of a
 > > multibyte sequence.  PEP 383 can't handle this

Ah, I see.  Of course, the algorithm not only has to handle the ASCII
octet which is erroneous because it can't be a trailing byte, but
*also the leading byte that signalled to expect a trailing byte >127*.
So the algorithm backs up to the character boundary (which is
well-defined for all the "sane" encodings), encode the high byte(s) in
the character with lone surrogates, and encode the ASCII as itself
(promoted to a Unicode code point).

Sorry, you're right, I was just confused.  I withdraw the objection as
completely mistaken, and apologize for not thinking more carefully in
the first place.
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4