On Apr 27, 2009, at 11:35 PM, Martin v. Löwis wrote: > No. You seem to assume that all bytes < 128 decode successfully > always. > I believe this assumption is wrong, in general: > > py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position > 3-4: illegal multibyte sequence > > All bytes are below 128, yet it fails to decode. Surely nobody uses iso2022 as an LC_CTYPE encoding. That's expressly forbidden by POSIX, if I'm not mistaken...and I can't see how it would work, considering that it uses all the bytes from 0x20-0x7f, including 0x2f ("/"), to represent non-ascii characters. Hopefully it can be assumed that your locale encoding really is a non- overlapping superset of ASCII, as is required by POSIX... I'm a bit scared at the prospect that U+DCAF could turn into "/", that just screams security vulnerability to me. So I'd like to propose that only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be encoded/decoded via the error handler. James
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4