Hi, The unicode_internal decoder doesn't decode surrogate pairs and so test_unicode.UnicodeTest.test_codecs() is failing on Windows (16-bit wchar_t). I don't know if this codec is still revelant with the PEP 393 because the internal representation is now depending on the maximum character (Py_UCS1*, Py_UCS2* or Py_UCS4*), whereas it was a fixed size with Python <= 3.2 (Py_UNICODE*). Should we: * Drop this codec (public and documented, but I don't know if it is used) * Use wchar_t* (Py_UNICODE*) to provide a result similar to Python 3.2, and so fix the decoder to handle surrogate pairs * Use the real representation (Py_UCS1*, Py_UCS2 or Py_UCS4* string) ? The failure on Windows: FAIL: test_codecs (test.test_unicode.UnicodeTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\Buildslave\3.x.moore-windows\build\lib\test\test_unicode.py", line 1408, in test_codecs self.assertEqual(str(u.encode(encoding),encoding), u) AssertionError: '\ud800\udc01\ud840\udc02\ud880\udc03\ud8c0\udc04\ud900\udc05' != '\U00030003\U00040004\U00050005' Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4