M.-A. Lemburg wrote: > we needed > a way to make sure that Python 3 also optionally supports working > with lone surrogates in such UTF-8 streams (nowadays called CESU-8: > http://en.wikipedia.org/wiki/CESU-8). I don't think CESU-8 is the same thing. According to the wiki page, CESU-8 *requires* all code points above 0xffff to be split into surrogate pairs before encoding. It also doesn't say that lone surrogates are valid -- it doesn't mention lone surrogates at all, only pairs. Neither does the linked technical report. The technical report also says that CESU-8 forbids any UTF-8 sequences of more than three bytes, so it's definitely not "UTF-8 plus lone surrogates". -- Greg
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4