> That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency). I don't have any insight on how > prevalent non-conforming strings will be in practice, or whether > supporting their equality will be required as a bugfix. If you are only Python, you cannot create a string in a non canonical form. If you use the C API, you can create a string in a non canonical form using PyUnicode_New() + PyUnicode_WRITE, or PyUnicode_FromUnicode(NULL, length) (or PyUnicode_FromStringAndSize(NULL, length)) + direct access to the Py_UNICODE* string. If you create strings in a non canonical form, it is a bug in your application and Python doesn't help you. But how could Python help you? Expose a function to check your newly creating string? There is already _PyUnicode_CheckConsistency() which is slow (O(n)) because it checks each character, it is only used in debug mode. Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4