[Raymond] > Does anyone have any issues with changing the hash multiplier for the > string and Unicode hash functions? Don't touch it unless you can prove major benefits -- it's a remarkable fact of life that the current multiplier hasn't resulted in any real-life (but non-contrived) pathological cases. > Instead of 1000003, I would like to use Aho's 65599, a prime near 2**16 > that is efficiently expressible as (x << 6) + (x << 16) - x. This > replaces a multiply with fast adds and shifts (with the two shifts > potentially being executed in parallel). It's unclear why that's a good thing. Perhaps you think shifts and adds are faster? I wouldn't -- the imul instruction on modern Pentiums is very fast. It is clear why it may be a bad thing: that it *can* be expressed as just a couple shifts and adds makes it suspect as a good scrambler (read Knuth). > Googling for "hash 65599" shows a long history of widespread use and > testing without any problems. Testing in the context of Python string hashing? I didn't think so <wink>. The current multiplier has been studied extensively *in* Python, both via real-life use and via large-scale focused statistical testing (we got some money to do that during BeOpen.com's brief life -- a surprise that came out of that was that CRC made a terrible string hash function as the number of input strings got large). The right thing to compare Python's string hash to is "the standard" Fowler-Noll-Vo string hash, which was developed independently, but which also ended up using a large multiplier.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4