I benchmarked this on a Pentium 4 and Raymond's hash does indeed come out ahead, regardless of compiler flags. Pentium IV, 2.4GHz: -O2 -mcpu=i386 -DMUL=100003 1.56 -O2 -mcpu=i386 -DMUL=65599 0.78 -O2 -mcpu=i386 -DCLEVER 0.54 * -O2 -mcpu=pentium3 -DMUL=100003 0.78 -O2 -mcpu=pentium3 -DMUL=65599 0.79 -O2 -mcpu=pentium3 -DCLEVER 0.53 * -O2 -mcpu=pentium4 -DMUL=100003 0.63 -O2 -mcpu=pentium4 -DMUL=65599 0.65 -O2 -mcpu=pentium4 -DCLEVER 0.50 * With AMD CPUs, the current multiplier beats both the new multipler and the version expressed as shifts and adds/subtracts: On a Duron, 1GHz: -O2 -mcpu=i386 -DMUL=100003 2.03 -O2 -mcpu=i386 -DMUL=65599 1.04 -O2 -mcpu=i386 -DCLEVER 0.95 * -O2 -mcpu=athlon -DMUL=100003 0.92 * -O2 -mcpu=athlon -DMUL=65599 1.03 -O2 -mcpu=athlon -DCLEVER 0.94 On an Athlon XP 2600+: -O2 -mcpu=i386 -DMUL=100003 0.95 -O2 -mcpu=i386 -DMUL=65599 0.49 -O2 -mcpu=i386 -DCLEVER 0.44 * -O2 -mcpu=athlon-xp -DMUL=100003 0.43 * -O2 -mcpu=athlon-xp -DMUL=65599 0.48 -O2 -mcpu=athlon-xp -DCLEVER 0.45 If we want a fast hash function, then we should write one that works a "long" at a time. This probably can't be done if hashes must be equal on different machines, but aren't hashes already different on LP64 machines (because they're 64 bits instead of 32)? (I guess the low-order bits would be identical) Long-at-a-time hash, Duron, 1GHz: -O2 -march=athlon-tbird -DMUL=100003 0.35 -O2 -march=athlon-tbird -DMUL=65599 0.41 -O2 -march=athlon-tbird -DCLEVER 0.42 With this code, it would be necessary to allocate strings "rounded up" to 4 bytes, and zero the unused bytes. Or we could do as Tim and Guido suggest: nothing. Jeff /* long-at-a-time hashing */ typedef struct { long ob_shash; unsigned long ob_size; union { unsigned char ob_sval[16]; long ob_svall[4]; }; } PyStringObject; static long string_hash(PyStringObject *a) { register int len; register long *p; register long x; if (a->ob_shash != -1) return a->ob_shash; len = (a->ob_size+3) / 4; p = a->ob_svall; x = *p << 7; while (--len >= 0) #ifdef CLEVER x = (x << 6) + (x << 16) - x + *p++; #else x = (MUL*x) ^ *p++; #endif x ^= a->ob_size; if (x == -1) x = -2; a->ob_shash = x; return x; }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4