Scott David Daniels <Scott.Daniels at Acm.Org> writes: > Still not disgusting, _but_ unicode strings must hash equal to > the corresponding "plain" string. I am not certain about this > requirement for non-ASCII characters, but I expect we are stuck > with matching hashes in the range ord(' ') through ord('~') and > probably for all character values from 0 through 127. Strictly speaking, unicode and byte strings must hash equal if they convert to each other through the system encoding. As a side effect, this really means that the system encoding currently must be ASCII, as the current hash function won't otherwise hash equal (latin-1 may also be correct as the system encoding). Fortunately, plain ASCII strings are in all applicable normal forms (NFC, NFKC, NFD, NFKD). Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4