On Thu, 6 Jul 2000, M.-A. Lemburg wrote: > Previously, Unicode used UTF-8 as basis for calculating the > hash value Right, and i was trying to suggest (in a previous message) that the hash value should be calculated from the actual Unicode character values themselves. Then for any case where it's possible for an 8-bit string to be =3D=3D to a Unicode string, they will have the same hash. Doesn't this solve the problem? Have i misunderstood? > How serious is the need for objects which compare equal to > have the same hash value ? For basic, immutable types like strings -- quite serious indeed, i would imagine. > 2. In some locales '=E4=F6=FC' =3D=3D u'=E4=F6=FC' is true, while in othe= rs this is > not the case. If they do compare equal, the hash values > must match. This sounds very bad. I thought we agreed that attempting to compare (or add) a Unicode string and an 8-bit string containing non-ASCII characters (as in your example) should raise an exception. Such an attempt constitutes an ambiguous request -- you haven't specified how to turn the 8-bit bytes into Unicode, and it's better to be explicit than to have the interpreter guess (and guess differently depending on the environment!!) -- ?!ng
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4