Guido van Rossum wrote: > > > Instead of tossing things we should be *constructive* and come > > up with a solution to the hash value problem, e.g. I would > > like to make the hash value be calculated from the UTF-16 > > value in a way that is compatible with ASCII strings. > > I think you are proposing to drop the following rule: > > if a == b then hash(a) == hash(b) > > or also > > if hash(a) != hasb(b) then a != b > > This is very fundamental for dictionaries! The rule is fine for situations where a and b have the same type, but you can't expect coercion to be consistent with it. > Note that it is currently > broken: > > >>> d = {'\200':1} > >>> d['\200'] > 1 > >>> u'\200' == '\200' > 1 > >>> d[u'\200'] > Traceback (most recent call last): > File "<stdin>", line 1, in ? > KeyError: ? > >>> That's because hash(unicode) currently get's calculated using the UTF-8 encoding as basis, while the compare uses the default encoding -- this needs to be changed, of course. > While you could fix this with a variable encoding, it would be very > hard, probably involving the string to Unicode before taking its hash, > and this would slow down the hash calculation for 8-bit strings > considerably (and these are fundamental for the speed of the > language!). > > So I am for restoring ASCII as the one and only fixed encoding. (Then > you can fix your hash much easier!) > > Side note: the KeyError handling is broken. The bad key should be run > through repr() (probably when the error is raised than when it is > displayed). Agreed. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4