Le 01/01/2012 04:29, Paul McMillan a écrit : > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. If we want to protect a website against this attack for example, we must suppose that the attacker can inject arbitrary data and can get (indirectly) the result of hash(str) (e.g. with the representation of a dict in a traceback, with a JSON output, etc.). > The goal isn't perfection, but we need to do better than a simple > salt. I disagree. I don't want to break backward compatibility and have a hash() function different for each process, if the change is not an effective protection against the "hash vulnerability". It's really hard to write a good (secure) hash function: see for example the recent NIST competition (started in 2008, will end this year). Even good security researcher are unable to write a strong and fast hash function. It's easy to add a weakness in the function if you don't have a good background in cryptography. The NIST competition gives 4 years to analyze new hash functions. We should not rush to add a quick "hack" if it doesn't solve correctly the problem (protect against a collision attack and preimage attack). http://en.wikipedia.org/wiki/NIST_hash_function_competition http://en.wikipedia.org/wiki/Collision_attack Runtime performance does matter, I'm not completly sure that changing Python is the best place to add a countermeasure against a vulnerability. I don't want to slow down numpy for a web vulnerability. Because there are different use cases, a better compromise is maybe to add a runtime option to use a secure hash function, and keep the unsafe but fast hash function by default. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 Always allocate 2**21 bytes just to workaround one specific kind of attack is not acceptable. I suppose that the maximum acceptable is 4096 bytes (or better 256 bytes). Crytographic hash functions don't need random data, why would Python need 2 MB (!) for its hash function? Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4