> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode > strings in a reversible way. That isn't really true; it is not, inherently, about UTF-8. Instead, it tries to represent non-filesystem-encoding byte sequence in Unicode strings in a reversible way. > Quietly escaping a bad UTF-8 encoding with private Unicode characters is > unlikely to be the right thing And indeed, the PEP stopped using PUA characters. > Therefore, when Python encounters path names on a file system > that are not consistent with the (assumed) encoding for that file > system, Python should raise an error. This is what happens currently, and users are quite unhappy about it. > If you really don't care what the string looks like and you just want an > encoding that round-trips without loss, you can probably just set your > encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding > arbitrary byte sequences to unicode strings as ISO 8859-15 is no less > correct than decoding them as the proposed "utf-8b". In fact, the most > likely source of non-UTF-8 sequences is ISO 8859 encodings. Yes, users can do that (to a degree), but they are still unhappy about it. The approach actually fails for command line arguments > As for what the byte-oriented interfaces should do, they are simply > platform dependent. On UNIX, they should do the obvious thing. On > Windows, they can either hook up to the low-level byte-oriented system > calls that the systems supply, or Windows could fake it and have the > byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 > sequences as illegal (there are already many illegal byte sequences > anyway). As is, these interfaces are incomplete - they don't support command line arguments, or environment variables. If you want to complete them, you should write a PEP. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4