In http://mail.python.org/pipermail/python-dev/2012-January/115368.html Stefan Behnel wrote: > Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. They SHOULD NOT represent the same content; comparing two strings currently requires converting them to canonical form, which means the smallest format (of those three) that works. If it can be represented in PyUnicode_1BYTE_KIND, then representations using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as canonical, won't be created by Python itself, and already compare unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a shortcut used by dicts). That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling _PyUnicode_CheckConsistency). I don't have any insight on how prevalent non-conforming strings will be in practice, or whether supporting their equality will be required as a bugfix. -jJ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4