[Guido] > I have to admit I have no clue about the details of this debate any > more, Na, there's no debate here. I believe I confused things by misunderstanding what Trent's original claim was (sorry, Trent!), but we bumped into real flaws in the current hash anyway (even on 32-bit machines). I don't think there's any actual disagreement about anything here. > and I'm cowardly awaiting a patch submission that Tim approves > of. As am I <wink>. > (I'm hoping a day will come when Tim can check it in himself. :-) Well, all you have to do to make that happen is get a real job and then hire me <wink>. > In the mean time, I'd like to emphasize the key invariant here: we > must ensure that (a==b) => (hash(a)==hash(b)). Absolutely. That's already true, and is so non-controversial that Trent elided ("...") the code for that in his last post. > One quick way to deal with this could be the following pseudo C: > > PyObject *double_hash(double x) > { > long l = (long)x; > if ((double)l == x) > return long_hash(l); > ...double-specific code... > } > > This code makes one assumption: that if there exists a long l equal to > a double x, the cast (long)x should yield l... No, that fails on two counts: 1. If x is "too big" to fit in a long (and a great many doubles are), the cast to long is undefined. Don't know about all current platforms, but on the KSR platform such casts raised a fatal hardware exception. The current code already accomplishes this part in a safe way (which Trent's patch improves by using a symbol instead of the current hard-coded hex constant). 2. The key invariant needs to be preserved also when x is an exact integral value that happens to be (possibly very!) much bigger than a C long; e.g., >>> long(1.23e300) # 1.23e300 is an integer! albeit not the one you think 12299999999999999456195024356787918820614965027709909500456844293279 60298864608335541984218516600989160291306221939122973741400364055485 57167627474369519296563706976894811817595986395177079943535811102573 51951343133141138298152217970719263233891682157645730823560232757272 73837119288529943287157489664L >>> hash(1.23e300) == hash(_) 1 >>> The current code already handles that correctly too. All the problems occur when the double has a non-zero fractional part, and Trent knows how to fix that now. hash(x) may differ across platforms because sizeof(long) differs across platforms, but that's just as true of strings as floats (i.e., Python has never computed platform-independent hashes -- if that bothers *you* (doesn't bother me), that's the part you should chime in on).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4