"Martin v. Löwis" writes: > > Now, with Python's file system encoding == UTF-8 or any packed EUC, > > and more than a handful of Shift JIS or Big5 characters in file names, > > one is *almost certain* to encounter ASCII as the second byte of a > > multibyte sequence. PEP 383 can't handle this Ah, I see. Of course, the algorithm not only has to handle the ASCII octet which is erroneous because it can't be a trailing byte, but *also the leading byte that signalled to expect a trailing byte >127*. So the algorithm backs up to the character boundary (which is well-defined for all the "sane" encodings), encode the high byte(s) in the character with lone surrogates, and encode the ASCII as itself (promoted to a Unicode code point). Sorry, you're right, I was just confused. I withdraw the objection as completely mistaken, and apologize for not thinking more carefully in the first place.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4