On September 11, 2014, Jeff Allen wrote: > ... the area of code point > space used for the smuggling of bytes under PEP-383 is not a > "Unicode Private Use Area", but a portion of the trailing surrogate > range. This is a code violation, which I imagine is why > "surrogateescape" is an error handler, not a codec. True, but I believe that is a CPython implementation detail. Other implementations (including jython) should implement the surrogatescape API, but I don't think it is important to use the same internal representation for the invalid bytes. (Well, unless you want to communicate with external tools (GUIs?) that are trying to directly use (effectively bytes rather than strings) in that particular internal encoding when communicating with python.) > lone surrogates preclude a naive use of the platform string library Invalid input often causes problems. Are you saying that there are situations where the platform string library could easily handle invalid characters in general, but has a problem with the specific case of lone surrogates? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4