> [Guido van Rossum] > > Hm, but isn't there a way to encode a NUL that doesn't produce a NUL? > > In some variant? [François] > There is also a rule about the shortest coding. It is invalid UTF-8 > to use more bytes than required, and a given UCS character has a > unique UTF-8 representation. Moreover, decoders should raise an > exception on non-minimal UTF-8 codings, and I do not know how Python > behaves with this. The Gambit author once told me he found a way to > implement the test very efficiently. > > One could use multi-byte sequences, that is, a sequence having no NULs, > that would fool a lazy UTF-8 decoder into producing a NUL. But for this, > one has to break the shortest coding rule, and start from invalid UTF-8. I knew all that, but I thought I'd read about a hack to encode NUL using c0 80, specifically to get around the limitation on encoded strings containing a NUL. But I can't find the reference so I'll shut up. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4