Stephen J. Turnbull wrote: > MRAB writes: > > > > I don't think "people shouldn't be using non-ASCII-compatible > > > encodings for locale encodings" is a sufficient rationale for a hard > > > error here. I mean, of course they *should* be using UTF-8. Maybe > > > Python 3.1 should just go ahead and error on any other encoding on > > > POSIX platforms? <wink> > > > > > I don't see why the error handler couldn't in principle be used with > > encodings other than UTF-8, although in that case all of the low > > surrogates should be open to use. > > I should have been more clear here, I guess. The error handler *can*, > and in the PEP *will be* by default, used with all "sane" locale > encodings on POSIX. > > It occurs to me that the PEP maybe should say that it is an error > to have your POSIX locale set to UTF-16 or something like that. > > What "sane" means in this context is > > 1. ASCII NUL is the bytearray terminator, and can't be used as a byte > in a file name. This rules out UTF-16, UTF-32, and widechar EUC > encodings, as well as some very rare ones. > [snip] It might be slightly OT, but sometimes strict UTF-8 encoding is violated by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as a terminator. I think I read that Microsoft sometimes does this.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4