Chris, I entirely agree. The same questioner also asked about the fastest data type to use as a key in a dictionary; and which data structure is fastest. I get the impression the person is very into micro-optimization, without profiling their application. It seems every choice is made based on the speed of that operation; without consideration of how often that operation is used. On 17/05/18 09:16, Chris Angelico wrote: > On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev > <python-dev at python.org> wrote: >> Victor, >> Thanks for the link, but to be honest it will just confuse people - neither >> the link or the related bpo entries state that the fix is only limited to >> strings. They simply talk about hash randomization - which in my opinion >> implies ALL hash algorithms; which is why I asked the question. >> >> I am not sure how much should be exposed about the scope of security fixes >> but you can understand my (and other's) confusion. >> >> I am aware that applications shouldn't make assumptions about the value of >> any given hash value - apart from some simple assumptions based hash value >> equality (i.e. if two objects have different hash values they can't be the >> same value). > The hash values of Python objects are calculated by the __hash__ > method, so arbitrary objects can do what they like, including > degenerate algorithms such as: > > class X: > def __hash__(self): return 7 Agreed - I should have said the default hash algorithm. Hashes for custom object are entirely application dependent. > > So it's impossible to randomize ALL hashes at the language level. Only > str and bytes hashes are randomized, because they're the ones most > likely to be exploitable - for instance, a web server will receive a > query like "http://spam.example/target?a=1&b=2&c=3" and provide a > dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always > going to create string keys in its dictionaries (JSON objects). Do you > know of any situation in which an attacker can provide the keys for a > dict/set as integers? I was just asking the question - rather than critiquing the fault-fix. I am actually more concerned that the documentation relating to the fix doesn't make it clear that only strings have their hashes randomised. >> /B//TW : // >> // >> //This question was prompted by a question on a social media platform about >> the whether hash values are transferable between across platforms. >> Everything I could find stated that after Python 3.3 ALL hash values were >> randomized - but that clearly isn't the case; and the original questioner >> identified that some hash values are randomized and other aren't.// >> / > That's actually immaterial. Even if the hashes weren't actually > randomized, you shouldn't be making assumptions about anything > specific in the hash, save that *within one Python process*, two equal > values will have equal hashes (and therefore two objects with unequal > hashes will not be equal). Entirely agree - I was just trying to get to the bottom of the difference - especially considering that the documentation I could find implied that all hash algorithms had been randomized. >> //I did suggest strongly to the original questioner that relying on the same >> hash value across different platforms wasn't a clever solution - their >> original plan was to store hash values in a cross system database to enable >> quick retrieval of data (!!!). I did remind the OP that a hash value wasn't >> guaranteed to be unique anyway - and they might come across two different >> values with the same hash - and no way to distinguish between them if all >> they have is the hash. Hopefully their revised design will store the key, >> not the hash./ > Uhh.... if you're using a database, let the database do the work of > being a database. I don't know what this "cross system database" would > be implemented in, but if it's a proper multi-user relational database > engine like PostgreSQL, it's already going to have way better indexing > than anything you'd do manually. I think there are WAY better > solutions than worrying about Python's inbuilt hashing. Agreed > If you MUST hash your data for sharing and storage, the easiest > solution is to just use a cryptographic hash straight out of > hashlib.py. As stated before - I think the original questioner was intent on micro optimizations - and they had hit on the idea that storing an integer would be quicker than storing as string - entirely ignoring both the practicality of trying to code all strings into a value (since hashes aren't guaranteed not to collide), and the issues of trying to reverse that translation once the stored key had been retrieved. > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com Thanks for your comments :-) -- -- Anthony Flury email : *Anthony.flury at btinternet.com* Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4