Lino Mastrodomenico wrote: > Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). "Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder happily accepts it and returns u'\udcff': >>> b'\xed\xb3\xbf'.decode('utf-8') '\udcff' If the PEP depends on this being changed, it should be mentioned in the PEP.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4