> I qualify with a). I believe I understand c) but, as explained in my > other post, I do not think your reason applies. In fact, I think > concern for naming rights might suggest that you *not* reuse the name > for something different. I would have to learn more about the existing > 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'. > 'Surrogates-escape' is pretty good for the new handler since, to my > understanding, it 'escapes' 'bad bytes' by prefixing them with bits that > push them to the surrogates plane. See issue 3672. In essence, in python 2.5: py> u"\ud800".encode("utf-8") '\xed\xa0\x80' py> '\xed\xa0\x80'.decode("utf-8") u'\ud800' In 3.1, py> "\ud800".encode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed py> "\ud800".encode("utf-8","surrogates") b'\xed\xa0\x80' py> b'\xed\xa0\x80'.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: illegal encoding py> b'\xed\xa0\x80'.decode("utf-8","surrogates") '\ud800' Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4