Hello, I'd still prefer to see a randomized hash()-function (at least for 3.3). But to protect against the attacks it would be sufficient to use randomization for collision resolution in dicts (and sets). What if we use a second (randomized) hash-function in case there are many collisions in ONE lookup. This hash-function is used only for collision resolution and is not cached. The benefits: * protection against the known attacks * hash(X) stays stable and the same * dict order is only changed when there are many collisions * doctests will not break * enhanced collision resolution * RNG doesn't have to be initialized in smaller programs * nearly no slowdown of most dicts * second hash-function is only used for keys with higher collision-rate * lower probability to leak secrets * possibility to use different secrets for each dict The drawback: * need to add a second hash-function * slower than using one hash-function only, when > 20 collisions * need to add this to container-types? (if used for py3.3) * need to expose this to the user? (if used for py3.3) * works only for datatypes with this new function * possible to implement without breaking ABI? The following code is meant for explanation purpose only: for(perturb = hash; ; perturb >>= 5) { i = (i << 2) + i + perturb + 1; if((collisions++) == 20) { // perturb is already zero after 13 rounds. // 20 collisions are rare. // you can add && (ma_mask > 256) to make 100% sure // that it's not used for smaller dicts. if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) { // If type has a randomized hash, use this now for lookup i = perturb = PyObject_RandomizedHash(key)); } ..... If I got this right we could add a new function "tp_randomized_hash" to 3.3 release. But can we also add functions in older releases, without breaking ABI? If not, can we implement this somehow using a flag? FOR OLDER RELEASE < 3.3: Py_hash_t PyObject_RandomizedHash(PyVarObject *o) { PyTypeObject *tp = Py_TYPE(v); if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH)) return -1; global_flags_somewhere->USE_RANDOMIZED_HASH = 1; return (*tp->tp_hash)(v); } .... and in unicodeobject.c: (and wherever we need randomization) static Py_hash_t unicode_hash(PyUnicodeObject *self) { Py_ssize_t len; Py_UNICODE *p; Py_hash_t x; Py_hash_t prefix=0; Py_hash_t suffix=0; if(global_flags_somewhere->USE_RANDOMIZED_HASH) { global_flags_somewhere->USE_RANDOMIZED_HASH = 0; initialize_rng_if_not_already_done_and_return_seed(&prefix, &suffix); ..... (and don't cache in this case) ..... It's ugly, but if I understand this correctly, the GIL will protect us against race-conditions, right? Hello, internals experts: Would this work or is there a better way to do this without breaking the ABI? Frank
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4