> It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is > not a valid Unicode character (not a character at all, really) and the > only way you can put this in a POSIX filename is if you use a very > lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'. > > Since this byte sequence doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). > > Martin: maybe the PEP should say this explicitly? Sure, will do. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4