RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2012-January.txt below:

From guido at python.org Sun Jan 1 00:56:00 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 16:56:00 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of str/bytes/unicode (in a way that they are still compatible with each other). The seed should be unique per process except it should survive fork() (but not exec()). I'm not worried about unrelated processes needing to have the same hash(), but I'm not against offering an env variable or command line flag to force the seed. I'm not too concerned about a 3rd party being able to guess the random seed -- this would require much more effort on their part, since they would have to generate a new set of colliding keys each time they think they have guessed the hash (as long as they can't force the seed -- this actually argues slightly *against* offering a way to force the seed, except that we have strong backwards compatibility requirements). We need to fix this as far back as Python 2.6, and it would be nice if a source patch was available that works on Python 2.5 -- personally I do have a need for a 2.5 fix and if nobody creates one I will probably end up backporting the fix from 2.6 to 2.5. Is there a tracker issue yet? The discussion should probably move there. PS. I would propose a specific fix but I can't seem to build a working CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this error late in the build: ./python.exe -SE -m sysconfig --generate-posix-vars Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "/Users/guido/cpython/Lib/io.py", line 60, in make: *** [Lib/_sysconfigdata.py] Abort trap -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 01:11:12 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 17:11:12 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum wrote: > PS. I would propose a specific fix but I can't seem to build a working > CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this > error late in the build: > > ./python.exe -SE -m sysconfig --generate-posix-vars > Fatal Python error: Py_Initialize: can't initialize sys standard streams > Traceback (most recent call last): > File "/Users/guido/cpython/Lib/io.py", line 60, in > make: *** [Lib/_sysconfigdata.py] Abort trap > FWIW I managed to build Python 2.6, and a trivial mutation of the string/unicode hash function (add 1 initially) made only three tests fail; test_symtable and test_json both have a dependency on dictionary order, test_ctypes I can't quite figure out what's going on. Oh, and an unrelated failure in test_sqlite: File "/Users/guido/pythons/p26/Lib/sqlite3/test/types.py", line 355, in CheckSqlTimestamp self.failUnlessEqual(ts.year, now.year) AssertionError: 2012 != 2011 I betcha that's because it's still 2011 here in Texas but already 2012 in UTC-land. Happy New Year everyone! :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcmillan.ws Sun Jan 1 04:29:59 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 31 Dec 2011 19:29:59 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > I'm not too concerned about a 3rd party being able to guess the random seed > -- this would require much more effort on their part, since they would have > to generate a new set of colliding keys each time they think they have > guessed the hash This is incorrect. Once an attacker has guessed the random seed, any operation which reveals the ordering of hashed objects can be used to verify the answer. JSON responses would be ideal. In fact, an attacker can do a brute-force attack of the random seed offline. Once they have the seed, generating collisions is a fast process. The goal isn't perfection, but we need to do better than a simple salt. I propose we modify the string hash function like this: https://gist.github.com/0a91e52efa74f61858b5 This code is based on PyPy's implementation, but the concept is universal. Rather than choosing a single short random seed per process, we generate a much larger random seed (r). As we hash, we deterministically choose a portion of that seed and incorporate it into the hash process. This modification is a minimally intrusive change to the existing hash function, and so should not introduce unexpected side effects which might come from switching to a different class of hash functions. I've worked through this code with Alex Gaynor, Antoine Pitrou, and Victor Stinner, and have asked several mathematicians and security experts to review the concept. The reviewers who have gotten back to me thus far have agreed that if the initial random seed is not flawed, this should not overly change the properties of the hash function, but should make it quite difficult for an attacker to deduce the necessary information to predictably cause hash collisions. This function is not designed to protect against timing attacks, but should be nontrivial to reverse even with access to timing data. Empirical testing shows that this unoptimized python implementation produces ~10% slowdown in the hashing of ~20 character strings. This is probably an acceptable trade off, and actually provides better performance in the case of short strings than a high-entropy fixed-length seed prefix. -Paul From martin at v.loewis.de Sun Jan 1 04:36:37 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 01 Jan 2012 04:36:37 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120101043637.Horde.a7_NDLuWis5O-9TFmFkWl9A@webmail.df.eu> > (Well, technically, you could use trees or some other O log n data > structure as a fallback once you have too many collisions, for some value > of "too many". Seems a bit wasteful for the purpose, though.) I don't think that would be wasteful. You wouldn't just use the tree for the case of too many collisions, but for any collision. You might special-case the case of a single key, i.e. start using the tree only if there is a collision. The issue is not the effort, but the need to support ordering if you want to use trees. So you might restrict this to dicts that have only str keys (which in practice should never have any collision, unless it's a deliberate setup). I'd use the tagged-pointer trick to determine whether a key is an object pointer (tag 0) or an AVL tree (tag 1). So in the common case of interned strings, the comparison for pointer equality (which is the normal case if the keys are interned) will succeed quickly; if pointer comparison fails, check the tag bit. Regards, Martin From solipsis at pitrou.net Sun Jan 1 05:11:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 05:11:03 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <20120101051103.67448343@pitrou.net> On Sat, 31 Dec 2011 16:56:00 -0700 Guido van Rossum wrote: > ISTM the only reasonable thing is to have a random seed picked very early > in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). Do str and bytes still have to be compatible with each other in 3.x? Merry hashes, weakrefs and thread-local memoryviews to everyone! cheers Antoine. From guido at python.org Sun Jan 1 05:22:47 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 21:22:47 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101051103.67448343@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> Message-ID: On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou wrote: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: > > ISTM the only reasonable thing is to have a random seed picked very early > > in the process, to be used to change the hash() function of > > str/bytes/unicode (in a way that they are still compatible with each > other). > > Do str and bytes still have to be compatible with each other in 3.x? > Hm, you're right, that's no longer a concern. (Though ATM the hashes still *are* compatible.) > Merry hashes, weakrefs and thread-local memoryviews to everyone! > :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 05:31:50 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 21:31:50 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan wrote: > > I'm not too concerned about a 3rd party being able to guess the random > seed > > -- this would require much more effort on their part, since they would > have > > to generate a new set of colliding keys each time they think they have > > guessed the hash > > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. > Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced. > The goal isn't perfection, but we need to do better than a simple > salt. Perhaps. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 > > This code is based on PyPy's implementation, but the concept is > universal. Rather than choosing a single short random seed per > process, we generate a much larger random seed (r). As we hash, we > deterministically choose a portion of that seed and incorporate it > into the hash process. This modification is a minimally intrusive > change to the existing hash function, and so should not introduce > unexpected side effects which might come from switching to a different > class of hash functions. > I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.) > I've worked through this code with Alex Gaynor, Antoine Pitrou, and > Victor Stinner, and have asked several mathematicians and security > experts to review the concept. The reviewers who have gotten back to > me thus far have agreed that if the initial random seed is not flawed, I forget -- what do we do on systems without urandom()? (E.g. Windows?) > this should not overly change the properties of the hash function, but > should make it quite difficult for an attacker to deduce the necessary > information to predictably cause hash collisions. This function is not > designed to protect against timing attacks, but should be nontrivial > to reverse even with access to timing data. > Let's worry about timing attacks another time okay? > Empirical testing shows that this unoptimized python implementation > produces ~10% slowdown in the hashing of ~20 character strings. This > is probably an acceptable trade off, and actually provides better > performance in the case of short strings than a high-entropy > fixed-length seed prefix. > Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcmillan.ws Sun Jan 1 06:57:09 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 31 Dec 2011 21:57:09 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > Still, it would represent an effort for the attacker of a much greater > magnitude than the current attack. It's all a trade-off -- at some point > it'll just be easier for the attacker to use some other vulnerability. Also > the class of vulnerable servers would be greatly reduced. I agree that doing anything is better than doing nothing. If we use the earlier suggestion and prepend everything with a fixed-length seed, we need quite a bit of entropy (and so a fairly long string) to make a lasting difference. > I'm not sure I understand this. What's the worry about "a different class of > hash functions"? (It may be clear that I do not have a deep mathematical > understanding of hash functions.) This was mostly in reference to earlier suggestions of switching to cityhash, or using btrees, or other more invasive changes. Python 2.X is pretty stable and making large changes like that to the codebase can have unpredictable effects. We know that the current hash function works well (except for this specific problem), so it seems like the best fix will be as minimal a modification as possible, to avoid introducing bugs. > I forget -- what do we do on systems without urandom()? (E.g. Windows?) Windows has CryptGenRandom which is approximately equivalent. > Let's worry about timing attacks another time okay? Agreed. As long as there isn't a gaping hole, I'm fine with that. > Hm. I'm not sure I like the idea of extra arithmetic for every character > being hashed. >From a performance standpoint, this may still be better than adding 8 or 10 characters to every single hash operation, since most hashes are over short strings. It is important that this function touches every character - if it only interacts with a subset of them, an attacker can fix that subset and vary the rest. > But I like the idea of a bigger random seed from which we > deterministically pick some part. Yeah. This makes it much harder to attack, since it very solidly places the attacker outside the realm of "just brute force the key". > How about just initializing x to some > subsequence of the seed determined by e.g. the length of the hashed string > plus a few characters from it? We did consider this, and if performance is absolutely the prime directive, this (or a variant) may be the best option. Unfortunately, the collision generator doesn't necessarily vary the length of the string. Additionally, if we don't vary based on all the letters in the string, an attacker can fix the characters that we do use and generate colliding strings around them. Another option to consider would be to apply this change to some but not all of the rounds. If we added the seed lookup xor operation for only the first and last 5 values of x, we would still retain much of the benefit without adding much computational overhead for very long strings. We could also consider a less computationally expensive operation than the modulo for calculating the lookup index, like simply truncating to the correct number of bits. -Paul From guido at python.org Sun Jan 1 16:09:54 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:09:54 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan wrote: > > Still, it would represent an effort for the attacker of a much greater > > magnitude than the current attack. It's all a trade-off -- at some point > > it'll just be easier for the attacker to use some other vulnerability. > Also > > the class of vulnerable servers would be greatly reduced. > > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. > Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer. > > I'm not sure I understand this. What's the worry about "a different > class of > > hash functions"? (It may be clear that I do not have a deep mathematical > > understanding of hash functions.) > > This was mostly in reference to earlier suggestions of switching to > cityhash, or using btrees, or other more invasive changes. Python 2.X > is pretty stable and making large changes like that to the codebase > can have unpredictable effects. Agreed. > We know that the current hash function > works well (except for this specific problem), so it seems like the > best fix will be as minimal a modification as possible, to avoid > introducing bugs. > Yup. > > I forget -- what do we do on systems without urandom()? (E.g. Windows?) > Windows has CryptGenRandom which is approximately equivalent. > > > Let's worry about timing attacks another time okay? > Agreed. As long as there isn't a gaping hole, I'm fine with that. > > > Hm. I'm not sure I like the idea of extra arithmetic for every character > > being hashed. > > From a performance standpoint, this may still be better than adding 8 > or 10 characters to every single hash operation, since most hashes are > over short strings. But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input. It is important that this function touches every > character - if it only interacts with a subset of them, an attacker > can fix that subset and vary the rest. > I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings. > > But I like the idea of a bigger random seed from which we > > deterministically pick some part. > > Yeah. This makes it much harder to attack, since it very solidly > places the attacker outside the realm of "just brute force the key". > > > How about just initializing x to some > > subsequence of the seed determined by e.g. the length of the hashed > string > > plus a few characters from it? > > We did consider this, and if performance is absolutely the prime > directive, this (or a variant) may be the best option. Unfortunately, > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. > Still, much more work for the attacker. > Another option to consider would be to apply this change to some but > not all of the rounds. If we added the seed lookup xor operation for > only the first and last 5 values of x, we would still retain much of > the benefit without adding much computational overhead for very long > strings. > I like that. > We could also consider a less computationally expensive operation than > the modulo for calculating the lookup index, like simply truncating to > the correct number of bits. > Sure. Thanks for thinking about all the details here!! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 16:13:11 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:13:11 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: Different concern. What if someone were to have code implementing an external, persistent hash table, using Python's hash() function? They might have a way to rehash everything when a new version of Python comes along, but they would not be happy if hash() is different in each process. I somehow vaguely remember possibly having seen such code, or something else where a bit of random data was needed and hash() was used since it's so easily available. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 16:13:44 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:13:44 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: PS. Is the collision-generator used in the attack code open source? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Sun Jan 1 16:27:51 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:27:51 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F007B77.8090901@cheimes.de> Am 01.01.2012 16:13, schrieb Guido van Rossum: > Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They > might have a way to rehash everything when a new version of Python comes > along, but they would not be happy if hash() is different in each > process. I somehow vaguely remember possibly having seen such code, or > something else where a bit of random data was needed and hash() was used > since it's so easily available. I had the same concern as you and was worried that projects like ZODB might require a stable hash function. Fred already stated that ZODB doesn't use the hash in its btree structures. Possible solutions: * make it possible to provide the seed as an env var * disable randomizing as default setting or at least add an option to disable randomization IMHO the issue needs a PEP that explains the issue, shows all possible solutions and describes how we have solved the issue. I'm willing to start a PEP. Who likes to be the co-author? Christian From lists at cheimes.de Sun Jan 1 16:30:26 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:30:26 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F007C12.4090401@cheimes.de> Am 31.12.2011 23:38, schrieb Terry Reedy: > On 12/31/2011 4:43 PM, PJ Eby wrote: > >> Here's an idea. Suppose we add a sys.hash_seed or some such, that's >> settable to an int, and defaults to whatever we're using now. Then >> programs that want a fix can just set it to a random number, > > I do not think we can allow that to change once there are hashed > dictionaries existing. Me, too. Armin suggested to use an env var as random. From lists at cheimes.de Sun Jan 1 16:48:32 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:48:32 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F008050.1030807@cheimes.de> Am 01.01.2012 00:56, schrieb Guido van Rossum: > ISTM the only reasonable thing is to have a random seed picked very > early in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). > > The seed should be unique per process except it should survive fork() > (but not exec()). I'm not worried about unrelated processes needing to > have the same hash(), but I'm not against offering an env variable or > command line flag to force the seed. I've created a clone at http://hg.python.org/features/randomhash/ as a testbed. The code creates the seed very early in PyInitializeEx(). The method isn't called on fork() but on exec(). > I'm not too concerned about a 3rd party being able to guess the random > seed -- this would require much more effort on their part, since they > would have to generate a new set of colliding keys each time they think > they have guessed the hash (as long as they can't force the seed -- this > actually argues slightly *against* offering a way to force the seed, > except that we have strong backwards compatibility requirements). The talkers claim and have shown that it's too easy to pre-calculate collisions with hashing algorithms similar to DJBX33X / DJBX33A. It might be a good idea to change the hashing algorithm, too. Paul as listed some new algorithms. Ruby 1.9 is using FNV http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a good dispersion pattern. A hashing algorithm without a meet-in-the-middle vulnerability would reduce the pressure on a good and secure seed, too. > We need to fix this as far back as Python 2.6, and it would be nice if a > source patch was available that works on Python 2.5 -- personally I do > have a need for a 2.5 fix and if nobody creates one I will probably end > up backporting the fix from 2.6 to 2.5. +1 Should the randomization be disabled on 2.5 to 3.2 by default to reduce backward compatibility issues? Christian From lists at cheimes.de Sun Jan 1 16:56:19 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:56:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101051103.67448343@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> Message-ID: <4F008223.8020806@cheimes.de> Am 01.01.2012 05:11, schrieb Antoine Pitrou: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: >> ISTM the only reasonable thing is to have a random seed picked very early >> in the process, to be used to change the hash() function of >> str/bytes/unicode (in a way that they are still compatible with each other). > > Do str and bytes still have to be compatible with each other in 3.x? py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking about this invariant? Christian From solipsis at pitrou.net Sun Jan 1 17:09:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 17:09:23 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> Message-ID: <20120101170923.5323628a@pitrou.net> On Sun, 01 Jan 2012 16:48:32 +0100 Christian Heimes wrote: > The talkers claim and have shown that it's too easy to pre-calculate > collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > might be a good idea to change the hashing algorithm, too. Paul as > listed some new algorithms. Ruby 1.9 is using FNV > http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > good dispersion pattern. We already seem to be using a FNV-alike, is it just a matter of changing the parameters? > A hashing algorithm without a > meet-in-the-middle vulnerability would reduce the pressure on a good and > secure seed, too. > > > We need to fix this as far back as Python 2.6, and it would be nice if a > > source patch was available that works on Python 2.5 -- personally I do > > have a need for a 2.5 fix and if nobody creates one I will probably end > > up backporting the fix from 2.6 to 2.5. > > +1 > > Should the randomization be disabled on 2.5 to 3.2 by default to reduce > backward compatibility issues? Isn't 2.5 already EOL'ed? As for 3.2, I'd say no. I don't know about 2.6 and 2.7. Regards Antoine. From solipsis at pitrou.net Sun Jan 1 17:10:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 17:10:03 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F008223.8020806@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> <4F008223.8020806@cheimes.de> Message-ID: <20120101171003.5f657a00@pitrou.net> On Sun, 01 Jan 2012 16:56:19 +0100 Christian Heimes wrote: > Am 01.01.2012 05:11, schrieb Antoine Pitrou: > > On Sat, 31 Dec 2011 16:56:00 -0700 > > Guido van Rossum wrote: > >> ISTM the only reasonable thing is to have a random seed picked very early > >> in the process, to be used to change the hash() function of > >> str/bytes/unicode (in a way that they are still compatible with each other). > > > > Do str and bytes still have to be compatible with each other in 3.x? > > py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking > about this invariant? Yes. It doesn't seem to have any point anymore. Regards Antoine. From lists at cheimes.de Sun Jan 1 17:20:34 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 17:20:34 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F0087D2.9090405@cheimes.de> Am 01.01.2012 06:57, schrieb Paul McMillan: > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. Your code at https://gist.github.com/0a91e52efa74f61858b5 reads about 2 MB (2**21 - 1) data from urandom. I'm worried that this is going to exhaust the OS's random pool and suck it dry. We shouldn't forget that Python is used for long running processes as well as short scripts. Your suggestion also increases the process size by 2 MB which is going to be an issue for mobile and embedded platforms. How about this: r = [ord(i) for i in os.urandom(256)] rs = os.urandom(4) # or 8 ? seed = rs[-1] + (rs[-2] << 8) + (rs[-3] << 16) + (rs[-4] << 24) def _hash_string(s): """The algorithm behind compute_hash() for a string or a unicode.""" from pypy.rlib.rarithmetic import intmask length = len(s) if length == 0: return -1 x = intmask(seed + (ord(s[0]) << 7)) i = 0 while i < length: o = ord(s[i]) x = intmask((1000003*x) ^ o ^ r[o % 0xff] i += 1 x ^= length return intmask(x) This combines a random seed for the hash with your suggestion. We also need to special case short strings. The above routine hands over the seed to attackers, if he is able to retrieve lots of single character hashes. The randomization shouldn't be used if we can prove that it's not possible to create hash collisions for strings shorter than X. For example 64bit FNV-1 has no collisions for 8 chars or less, 32bit FNV has no collisions for 4 or less cars. Christian Christian From lists at cheimes.de Sun Jan 1 17:34:31 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 17:34:31 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101170923.5323628a@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> Message-ID: <4F008B17.8000606@cheimes.de> Am 01.01.2012 17:09, schrieb Antoine Pitrou: > On Sun, 01 Jan 2012 16:48:32 +0100 > Christian Heimes wrote: >> The talkers claim and have shown that it's too easy to pre-calculate >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It >> might be a good idea to change the hashing algorithm, too. Paul as >> listed some new algorithms. Ruby 1.9 is using FNV >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a >> good dispersion pattern. > > We already seem to be using a FNV-alike, is it just a matter of > changing the parameters? No, we are using something similar to DJBX33X. FNV is a completely different type of hash algorithm. From solipsis at pitrou.net Sun Jan 1 17:54:04 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 01 Jan 2012 17:54:04 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F008B17.8000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> Message-ID: <1325436844.3472.6.camel@localhost.localdomain> Le dimanche 01 janvier 2012 ? 17:34 +0100, Christian Heimes a ?crit : > Am 01.01.2012 17:09, schrieb Antoine Pitrou: > > On Sun, 01 Jan 2012 16:48:32 +0100 > > Christian Heimes wrote: > >> The talkers claim and have shown that it's too easy to pre-calculate > >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > >> might be a good idea to change the hashing algorithm, too. Paul as > >> listed some new algorithms. Ruby 1.9 is using FNV > >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > >> good dispersion pattern. > > > > We already seem to be using a FNV-alike, is it just a matter of > > changing the parameters? > > No, we are using something similar to DJBX33X. FNV is a completely > different type of hash algorithm. I don't understand. FNV-1 multiplies the current running result with a prime and then xors it with the following byte. This is also what we do. (I'm assuming 1000003 is prime) I see two differences: - FNV starts with a non-zero constant offset basis - FNV uses a different prime than ours (as a side note, FNV operates on bytes, but for unicode we must operate on code points in [0, 1114111]: although arguably the common case is hashing ASCII substrings (protocol tokens etc.)) Regards Antoine. From lists at cheimes.de Sun Jan 1 18:28:19 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 18:28:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <1325436844.3472.6.camel@localhost.localdomain> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> <1325436844.3472.6.camel@localhost.localdomain> Message-ID: <4F0097B3.1040905@cheimes.de> Am 01.01.2012 17:54, schrieb Antoine Pitrou: > I don't understand. FNV-1 multiplies the current running result with a > prime and then xors it with the following byte. This is also what we do. > (I'm assuming 1000003 is prime) There must be a major difference somewhere inside the algorithm. The talk at the CCC conference in Berlin mentions that Ruby 1.9 is not vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C code of FNV is more complex than our code, too. Christian From lists at cheimes.de Sun Jan 1 18:32:12 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 18:32:12 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F00989C.4010708@cheimes.de> Am 01.01.2012 01:11, schrieb Guido van Rossum: > FWIW I managed to build Python 2.6, and a trivial mutation of the > string/unicode hash function (add 1 initially) made only three tests > fail; test_symtable and test_json both have a dependency on dictionary > order, test_ctypes I can't quite figure out what's going on. In my fork, these tests are failing: test_dbm test_dis test_gdb test_inspect test_packaging test_set test_symtable test_urllib test_userdict test_collections test_json From victor.stinner at haypocalc.com Sun Jan 1 18:32:51 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 01 Jan 2012 18:32:51 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F0098C3.9040609@haypocalc.com> Le 01/01/2012 04:29, Paul McMillan a ?crit : > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. If we want to protect a website against this attack for example, we must suppose that the attacker can inject arbitrary data and can get (indirectly) the result of hash(str) (e.g. with the representation of a dict in a traceback, with a JSON output, etc.). > The goal isn't perfection, but we need to do better than a simple > salt. I disagree. I don't want to break backward compatibility and have a hash() function different for each process, if the change is not an effective protection against the "hash vulnerability". It's really hard to write a good (secure) hash function: see for example the recent NIST competition (started in 2008, will end this year). Even good security researcher are unable to write a strong and fast hash function. It's easy to add a weakness in the function if you don't have a good background in cryptography. The NIST competition gives 4 years to analyze new hash functions. We should not rush to add a quick "hack" if it doesn't solve correctly the problem (protect against a collision attack and preimage attack). http://en.wikipedia.org/wiki/NIST_hash_function_competition http://en.wikipedia.org/wiki/Collision_attack Runtime performance does matter, I'm not completly sure that changing Python is the best place to add a countermeasure against a vulnerability. I don't want to slow down numpy for a web vulnerability. Because there are different use cases, a better compromise is maybe to add a runtime option to use a secure hash function, and keep the unsafe but fast hash function by default. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 Always allocate 2**21 bytes just to workaround one specific kind of attack is not acceptable. I suppose that the maximum acceptable is 4096 bytes (or better 256 bytes). Crytographic hash functions don't need random data, why would Python need 2 MB (!) for its hash function? Victor From tjreedy at udel.edu Sun Jan 1 19:45:01 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 01 Jan 2012 13:45:01 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On 1/1/2012 10:13 AM, Guido van Rossum wrote: > PS. Is the collision-generator used in the attack code open source? As I posted before, Alexander Klink and Julian W?lde gave their project email as hashDoS at alech.de. Since they indicated disappointment in not hearing from Python, I presume they would welcome engagement. -- Terry Jan Reedy From tjreedy at udel.edu Sun Jan 1 19:46:51 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 01 Jan 2012 13:46:51 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0097B3.1040905@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> <1325436844.3472.6.camel@localhost.localdomain> <4F0097B3.1040905@cheimes.de> Message-ID: On 1/1/2012 12:28 PM, Christian Heimes wrote: > Am 01.01.2012 17:54, schrieb Antoine Pitrou: >> I don't understand. FNV-1 multiplies the current running result with a >> prime and then xors it with the following byte. This is also what we do. >> (I'm assuming 1000003 is prime) > > There must be a major difference somewhere inside the algorithm. The > talk at the CCC conference in Berlin mentions that Ruby 1.9 is not > vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C > code of FNV is more complex than our code, too. I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying that they consider that using a random non-zero start value is sufficient to make the hash non-vulnerable. -- Terry Jan Reedy From jimjjewett at gmail.com Mon Jan 2 00:28:02 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 18:28:02 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Steven D'Aprano (in ) wrote: > By compile-time, do you mean when the byte-code is compilated, i.e. just > before runtime, rather than a switch when compiling the Python executable from > source? No. I really mean when the C code is initially compiled to produce an python executable. The only reason we're worrying about this is that an adversary may force worst-case performance. If the python instance isn't a server, or at least isn't exposed to untrusted clients, then even a single extra "if" test is unjustified overhead. Adding overhead to every string hash or every dict lookup is bad. That said, adding some overhead (only) to dict lookups *that already hit half a dozen consecutive collisions* probably is reasonable, because that won't happen very often with normal data. (6 collisions can't happen at all unless there are already at least 6 entries, so small dicts are safe; with at least 1/3 of the slots empty, it should happen only 1/729 for worst-size larger dicts.) -jJ From paul at mcmillan.ws Mon Jan 2 00:43:52 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 15:43:52 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > But how about precomputing the intermediate value (x)? The hash is (mostly) > doing x = f(x, c) for each c in the input. That's a fair point. If we go down that avenue, I think simply choosing a random fixed starting value for x is the correct choice, rather than computing an intermediate value. > I sort of see your point, but I still think that if we could add as little > per-character overhead as possible it would be best -- sometimes people *do* > hash very long strings. Yep, agreed. My original proposal did not adequately address this. >> Another option to consider would be to apply this change to some but >> not all of the rounds. If we added the seed lookup xor operation for >> only the first and last 5 values of x, we would still retain much of >> the benefit without adding much computational overhead for very long >> strings. > > I like that. I believe this is a reasonable solution. An attacker could still manipulate the internal state of long strings, but the additional information at both ends should make that difficult to exploit. I'll talk it over with the reviewers. >> We could also consider a less computationally expensive operation than >> the modulo for calculating the lookup index, like simply truncating to >> the correct number of bits. > > Sure. Thanks for thinking about all the details here!! Again, I'll talk to the reviewers (and run the randomness test battery) to be double-check that this doesn't affect the distribution in some unexpected way, but I think it should be fine. > PS. Is the collision-generator used in the attack code open source? Not in working form, and they've turned down requests for it from other projects that want to check their work. If it's imperative that we have one, I can write one, but I'd rather not spend the effort if we don't need it. -Paul From paul at mcmillan.ws Mon Jan 2 00:49:14 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 15:49:14 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They might > have a way to rehash everything when a new version of Python comes along, > but they would not be happy if hash() is different in each process. I > somehow vaguely remember possibly having seen such code, or something else > where a bit of random data was needed and hash() was used since it's so > easily available. I agree that there are use cases for allowing users to choose the random seed, in much the same way it's helpful to be able to set it for the random number generator. This should probably be something that can be passed in at runtime. This feature would also be useful for users who want to synchronize the hashes of multiple independent processes, for whatever reason. For the general case though, randomization should be on by default. -Paul From jimjjewett at gmail.com Mon Jan 2 01:02:44 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:02:44 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Paul McMillan in wrote: > Guido van Rossum wrote: >> Hm. I'm not sure I like the idea of extra arithmetic for every character >> being hashed. > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. If the new hash algorithm doesn't kick in before, say, 32 characters, then most currently hashed strings will not be affected. And if the attacker has to add 32 characters to every key, it reduces the "this can be done with only N bytes uploaded" risk. (The same logic would apply to even longer prefixes, except that an attacker might more easily find short-enough strings that collide.) > We could also consider a less computationally expensive operation > than the modulo for calculating the lookup index, like simply truncating > to the correct number of bits. Given that the modulo is always 2^N, how is that different? -jJ From jimjjewett at gmail.com Mon Jan 2 01:21:11 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:21:11 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Victor Stinner wrote in > If we want to protect a website against this attack for example, we must > suppose that the attacker can inject arbitrary data and can get > (indirectly) the result of hash(str) (e.g. with the representation of a > dict in a traceback, with a JSON output, etc.). (1) Is it common to hash non-string input? Because generating integers that collide for certain dict sizes is pretty easy... (2) Would it make sense for traceback printing to sort dict keys? (Any site worried about this issue should already be hiding tracebacks from untrusted clients, but the cost of this extra protection may be pretty small, given that tracebacks shouldn't be printed all that often in the first place.) (3) Should the docs for json.encoder.JSONEncoder suggest sort_keys=True? -jJ From jimjjewett at gmail.com Mon Jan 2 01:37:26 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:37:26 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html Message-ID: In http://mail.python.org/pipermail/python-dev/2011-December/115172.html, P. J. Eby wrote: > On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote: >> While the dictionary probe has to start with a hash for backward >> compatibility reasons, is there a reason the overflow strategy for >> insertion has to be buckets containing lists? How about >> double-hashing, etc? > This won't help, because the keys still have the same hash value. ANYTHING > you do to them after they're generated will result in them still colliding. > The *only* thing that works is to change the hash function in such a way > that the strings end up with different hashes in the first place. > Otherwise, you'll still end up with (deliberate) collisions. Well, there is nothing wrong with switching to a different hash function after N collisions, rather than "in the first place". The perturbation effectively does by shoving the high-order bits through the part of the hash that survives the mask. > (Well, technically, you could use trees or some other O log n data > structure as a fallback once you have too many collisions, for some value > of "too many". Seems a bit wasteful for the purpose, though.) Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > requires using a real dictionary for compatibility; storing some of the values outside the values array would violate that. Do you consider that obsolete? -jJ From lists at cheimes.de Mon Jan 2 02:04:38 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 02:04:38 +0100 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: <4F0102A6.1020108@cheimes.de> Am 02.01.2012 01:37, schrieb Jim Jewett: > Well, there is nothing wrong with switching to a different hash function after N > collisions, rather than "in the first place". The perturbation > effectively does by > shoving the high-order bits through the part of the hash that survives the mask. Except that it won't work or slow down every lookup of missing keys? It's absolutely crucial that the lookup time is kept as fast as possible. You can't just change the hash algorithm in the middle of the work without a speed impact on lookups. The size of the dict can shrink or grow over time. This results into a different number of collisions for the same string. Cuckoo hashing (http://en.wikipedia.org/wiki/Hash_table#Collision_resolution) doesn't sound feasible for us because it slows down lookup and requires an ABI incompatible change for more hash slots on str/bytes/unicode instances. Christian PS: Something is wrong with your email client. Every of your replies starts a new thread for me. From solipsis at pitrou.net Mon Jan 2 02:19:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 02:19:37 +0100 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html References: <4F0102A6.1020108@cheimes.de> Message-ID: <20120102021937.029dcb91@pitrou.net> On Mon, 02 Jan 2012 02:04:38 +0100 Christian Heimes wrote: > > PS: Something is wrong with your email client. Every of your replies > starts a new thread for me. Same here. Regards Antoine. From jimjjewett at gmail.com Mon Jan 2 02:23:16 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 20:23:16 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: <4F0102A6.1020108@cheimes.de> References: <4F0102A6.1020108@cheimes.de> Message-ID: On Sun, Jan 1, 2012 at 8:04 PM, Christian Heimes wrote: > Am 02.01.2012 01:37, schrieb Jim Jewett: >> Well, there is nothing wrong with switching to a different hash function after N >> collisions, rather than "in the first place". ?The perturbation >> effectively does by >> shoving the high-order bits through the part of the hash that survives the mask. > Except that it won't work or slow down every lookup of missing keys? > It's absolutely crucial that the lookup time is kept as fast as possible. It will only slow down missing keys that themselves hit more than N collisions. Or were you assuming that I meant to switch the whole table, rather than just that one key? I agree that wouldn't work. > You can't just change the hash algorithm in the middle of the work > without a speed impact on lookups. Right -- but there is nothing wrong with modifying the lookdict (and insert_clean) functions to do something different after the Nth collision than they did after the N-1th. -jJ From pje at telecommunity.com Mon Jan 2 04:00:33 2012 From: pje at telecommunity.com (PJ Eby) Date: Sun, 1 Jan 2012 22:00:33 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett wrote: > Well, there is nothing wrong with switching to a different hash function > after N > collisions, rather than "in the first place". The perturbation > effectively does by > shoving the high-order bits through the part of the hash that survives the > mask. > Since these are true hash collisions, they will all have the same high order bits. So, the usefulness of the perturbation is limited mainly to the common case where true collisions are rare. > (Well, technically, you could use trees or some other O log n data > > structure as a fallback once you have too many collisions, for some value > > of "too many". Seems a bit wasteful for the purpose, though.) > > Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > > requires > using a real dictionary for compatibility; storing some of the values > outside the > values array would violate that. When I said "use some other data structure", I was referring to the internal implementation of the dict type, not to user code. The only user-visible difference (even at C API level) would be the order of keys() et al. (In any case, I still assume this is too costly an implementation change compared to changing the hash function or seeding it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Jan 2 04:28:13 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 22:28:13 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: On Sun, Jan 1, 2012 at 10:00 PM, PJ Eby wrote: > On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett wrote: >> Well, there is nothing wrong with switching to a different hash function >> after N >> collisions, rather than "in the first place". ?The perturbation >> effectively does by >> shoving the high-order bits through the part of the hash that survives the >> mask. > Since these are true hash collisions, they will all have the same high order > bits. ?So, the usefulness of the perturbation is limited mainly to the > common case where true collisions are rare. That is only because the perturb is based solely on the hash. Switching to an entirely new hash after the 5th collision (for a given lookup) would resolve that (after the 5th collision); the question is whether or not the cost is worthwhile. >> > (Well, technically, you could use trees or some other O log n data >> > structure as a fallback once you have too many collisions, for some >> > value >> > of "too many". ?Seems a bit wasteful for the purpose, though.) >> >> Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > >> requires >> using a real dictionary for compatibility; storing some of the values >> outside the >> values array would violate that. > When I said "use some other data structure", I was referring to the internal > implementation of the dict type, not to user code. ?The only user-visible > difference (even at C API level) would be the order of keys() et al. Given the wording requiring a real dictionary, I would have assumed that it was OK (if perhaps not sensible) to do pointer arithmetic and access the keys/values/hashes directly. (Though if the breakage was between python versions, I would feel guilty about griping too loudly.) -jJ From ncoghlan at gmail.com Mon Jan 2 05:44:49 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 14:44:49 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces Message-ID: I've been having an occasional argument with Benjamin regarding braces in 4-line if statements: if (cond) statement; else statement; vs. if (cond) { statement; } else { statement; } He keeps leaving them out, I occasionally tell him they should always be included (most recently this came up when we gave conflicting advice to a patch contributor). He says what he's doing is OK, because he doesn't consider the example in PEP 7 as explicitly disallowing it, I think it's a recipe for future maintenance hassles when someone adds a second statement to one of the clauses but doesn't add the braces. (The only time I consider it reasonable to leave out the braces is for one liner if statements, where there's no else clause at all) Since Benjamin doesn't accept the current brace example in PEP 7 as normative for the case above, I'm bringing it up here to seek clarification. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ben+python at benfinney.id.au Mon Jan 2 06:25:51 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 02 Jan 2012 16:25:51 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces References: Message-ID: <87pqf21xr4.fsf@benfinney.id.au> Nick Coghlan writes: > He keeps leaving [braces] out [when the block is a single statement], > I occasionally tell him they should always be included (most recently > this came up when we gave conflicting advice to a patch contributor). As someone who has maintained his fair share of C code, I am firmly on the side of unconditionally (!) enclosing C statement blocks in braces regardless of how many statements they contain. > He says what he's doing is OK, because he doesn't consider the example > in PEP 7 as explicitly disallowing it I wonder if he has a stronger argument in favour of his position, because ?it's not forbidden? doesn't imply ?it's okay?. > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. Agreed, it's an issue of code maintainability. Which is enough of a problem in C code that a low-cost improvement like this should always be done. But, as someone who carries no water in the Python developer community, my opinion has no more force than the arguments, and I can't impose it on anyone. Take it for what it's worth. -- \ ?God was invented to explain mystery. God is always invented to | `\ explain those things that you do not understand.? ?Richard P. | _o__) Feynman, 1988 | Ben Finney From scott+python-dev at scottdial.com Mon Jan 2 06:04:15 2012 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 02 Jan 2012 00:04:15 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F013ACF.7090708@scottdial.com> On 1/1/2012 11:44 PM, Nick Coghlan wrote: > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) Could you explain how these two cases differ with regard to maintenance? In either case, there are superfluous edits required if the original author had used braces *always*. Putting a brace on one-liners adds only a single line to the code -- just like in the if/else case. So, your argument seems conflicted. Surely, you would think this is a simpler edit to make and diff to see in a patch file: if(cond) { stmt1; + stmt2; } vs. -if(cond) +if(cond) { stmt1; + stmt2; +} Also, the superfluous edits will wrongly attribute the blame for the conditional to the wrong author. -- Scott Dial scott at scottdial.com From paul at mcmillan.ws Mon Jan 2 06:55:52 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 21:55:52 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0087D2.9090405@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: I fixed a couple things in my proposed algorithm: https://gist.github.com/0a91e52efa74f61858b5 I had a typo, and used 21 instead of 12 for the size multiplier. We definitely don't need 2MB random data. The initialization of r was broken. Now it is an array of ints, so there's no conversion when it's used. I've adjusted it so there's 8k of random data, broken into 2048 ints. I added a length-based seed to the initial value of x. This prevents single-characters from being used to enumerate raw values from r. This is similar to the change proposed by Christian Heimes. Most importantly, I moved the xor with r[x % len_r] down a line. Before, it wasn't being applied to the last character. > Christian Heimes said: > We also need to special case short strings. The above routine hands over > the seed to attackers, if he is able to retrieve lots of single > character hashes. The updated code always includes at least 2 lookups from r, which I believe solves the single-character enumeration problem. If we special-case part of our hash function for short strings, we may get suboptimal collisions between the two types of hashes. I think Ruby uses FNV-1 with a salt, making it less vulnerable to this. FNV is otherwise similar to our existing hash function. For the record, cryptographically strong hash functions are in the neighborhood of 400% slower than our existing hash function. > Terry Reedy said: > I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying > that they consider that using a random non-zero start value is sufficient to > make the hash non-vulnerable. I've been talking to them. They're happy to look at our proposed changes. They indicate that a non-zero start value is sufficient to prevent the attack, but D. J. Bernstein disagrees with them. He also has indicated a willingness to look at our solution. -Paul From ncoghlan at gmail.com Mon Jan 2 07:02:59 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 16:02:59 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F013ACF.7090708@scottdial.com> References: <4F013ACF.7090708@scottdial.com> Message-ID: On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial wrote: > On 1/1/2012 11:44 PM, Nick Coghlan wrote: >> I think it's a recipe for future maintenance hassles when someone adds >> a second statement to one of the clauses but doesn't add the braces. >> (The only time I consider it reasonable to leave out the braces is for >> one liner if statements, where there's no else clause at all) > > Could you explain how these two cases differ with regard to maintenance? Sure: always including K&R style braces for compound statements (even when they aren't technically necessary) means that indentation == control flow, just like Python. Indent your additions correctly, and the reader and compiler will agree on what they mean: if (cond) { statement; } else { statement; addition; /* Reader and compiler agree this is part of the else clause */ } if (cond) statement; else statement; addition; /* Uh-oh, should have added braces */ I've been trying to convince Benjamin that there's a reason "always include the braces" is accepted wisdom amongst many veteran C programmers (with some allowing an exception for one-liners), but he isn't believing me, and I'm not going to go through and edit every single one of his commits to add them. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Mon Jan 2 07:16:42 2012 From: pje at telecommunity.com (PJ Eby) Date: Mon, 2 Jan 2012 01:16:42 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) Message-ID: On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett wrote: > Given the wording requiring a real dictionary, I would have assumed > that it was OK (if perhaps not sensible) to do pointer arithmetic and > access the keys/values/hashes directly. (Though if the breakage was > between python versions, I would feel guilty about griping too > loudly.) > If you're going to be a language lawyer about it, I would simply point out that all the spec requires is that "type(env) is dict" -- it says nothing about how Python defines "type" or "is" or "dict". So, you're on your own with that one. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jan 2 07:22:05 2012 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 02 Jan 2012 00:22:05 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <1325485325.20247.37.camel@Gutsy> On Mon, 2012-01-02 at 14:44 +1000, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. I've had to correct my self on this one a few times so I will have to agree it's a good practice. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) The problem is only when an additional statement is added to the last block, not the preceding ones, as the compiler will complain about those. So I don't know how the 4 line example without braces is any worse than a 2 line if without braces. I think my preference is, if any block in a multi-block expression needs braces, then the other blocks should have it. (Including the last block even if it's a single line.) The next level up would be to require them on all blocks, even two line if expressions, but I'm not sure that is really needed. At some point the extra noise of the braces makes things harder to read rather than easier, and what you gain in preventing one type of error may increase chances of another type of error not being noticed. Cheers, Ron > Since Benjamin doesn't accept the current brace example in PEP 7 as > normative for the case above, I'm bringing it up here to seek > clarification. From ncoghlan at gmail.com Mon Jan 2 09:31:00 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 18:31:00 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <1325485325.20247.37.camel@Gutsy> References: <1325485325.20247.37.camel@Gutsy> Message-ID: On Mon, Jan 2, 2012 at 4:22 PM, Ron Adam wrote: > The problem is only when an additional statement is added to the last > block, not the preceding ones, as the compiler will complain about > those. ?So I don't know how the 4 line example without braces is any > worse than a 2 line if without braces. Even when the compiler picks it up, it's still a wasted edit-compile cycle. More importantly though, this approach makes the rules too complicated. "Always use braces" is simple and easy, and the only cost is the extra line of vertical whitespace for the closing brace. (I personally don't like even the exception made for single clause if statements, but that's already too prevalent in the code base to do anything about. Hence the 4-line example in my original post.) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Mon Jan 2 09:47:14 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 00:47:14 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> On Jan 1, 2012, at 8:44 PM, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } Really? Do we need to have a brace war? People have different preferences. The standard library includes some of both styles depending on what the maintainer thought was cleanest to their eyes in a given context. Raymond From ncoghlan at gmail.com Mon Jan 2 11:15:25 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 20:15:25 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> Message-ID: On Mon, Jan 2, 2012 at 6:47 PM, Raymond Hettinger wrote: > Really? ?Do we need to have a brace war? > People have different preferences. > The standard library includes some of both styles > depending on what the maintainer thought was cleanest to their eyes in a given context. If the answer is "either form is OK", I'm actually fine with that (and will update PEP 7 accordingly). However, I have long read PEP 7 as *requiring* the braces, and until noticing their absence in some of Benjamin's checkins and the recent conflicting advice we gave when reviewing the same patch, I had never encountered their absence in the CPython code base outside the one-liner/two-liner case*. Since I *do* feel strongly that leaving them out is a mistake that encourages future defects, and read PEP 7 as agreeing with that (aside from the general "follow conventions in surrounding code" escape clause), I figured it was better to bring it up explicitly and clarify PEP 7 accordingly (since what is currently there is clearly ambiguous enough for two current committers to have diametrically opposed views on what it says). Cheers, Nick. * That is, constructs like: if (error_condition) return -1; if (error_condition) return -1; -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Mon Jan 2 11:25:16 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 02 Jan 2012 05:25:16 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: On 1/2/2012 12:55 AM, Paul McMillan wrote: >> Terry Reedy said: >> I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying >> that they consider that using a random non-zero start value is sufficient to >> make the hash non-vulnerable. > > I've been talking to them. They're happy to look at our proposed > changes. They indicate that a non-zero start value is sufficient to > prevent the attack, but D. J. Bernstein disagrees with them. He also > has indicated a willingness to look at our solution. Great. My main concern currently is that there should be no noticeable slowdown for 64 bit builds which are apparently not vulnerable and which therefore would get no benefit. Terry Jan Reedy From solipsis at pitrou.net Mon Jan 2 13:01:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 13:01:05 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: <20120102130105.2ce82e10@pitrou.net> On Sun, 1 Jan 2012 21:55:52 -0800 Paul McMillan wrote: > > This is similar to the change proposed by Christian Heimes. > > Most importantly, I moved the xor with r[x % len_r] down a line. > Before, it wasn't being applied to the last character. Shouldn't it be r[i % len(r)] instead? (refer to yesterday's #python-dev discussion) > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. Again, we could re-use FNV-1's primes, since they claim they have better dispersion properties than the average prime. Regards Antoine. From solipsis at pitrou.net Mon Jan 2 13:05:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 13:05:28 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces References: Message-ID: <20120102130528.754cec85@pitrou.net> On Mon, 2 Jan 2012 14:44:49 +1000 Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } Good, I was afraid python-dev was getting a bit futile with all these security concerns about hash functions. I don't like having the else on the same line as the closing brace, and prefer: if (cond) { statement; } else { statement; } That said, I agree with Benjamin: the shorter form is visually lighter and should not be frown upon. Regards Not-frowning Antoine. From petri at digip.org Mon Jan 2 14:24:28 2012 From: petri at digip.org (Petri Lehtinen) Date: Mon, 2 Jan 2012 15:24:28 +0200 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <20120102130528.754cec85@pitrou.net> References: <20120102130528.754cec85@pitrou.net> Message-ID: <20120102132428.GR24315@p16> Antoine Pitrou wrote: > I don't like having the else on the same line as the closing brace, > and prefer: > > if (cond) { > statement; > } > else { > statement; > } And this is how it's written in PEP-7. It seems to me that PEP-7 doesn't require braces. But it explicitly forbids if (cond) { statement; } else { statement; } by saying "braces as shown", and then showing them like this: if (mro != NULL) { ... } else { ... } > That said, I agree with Benjamin: the shorter form is visually lighter > and should not be frown upon. Me too. From ned at nedbatchelder.com Mon Jan 2 15:10:40 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 02 Jan 2012 09:10:40 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F01BAE0.5000009@nedbatchelder.com> On 1/1/2012 11:44 PM, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) > > Since Benjamin doesn't accept the current brace example in PEP 7 as > normative for the case above, I'm bringing it up here to seek > clarification. I've always valued readability and consistency above brevity, and Python does too. *Sometimes* using braces in C is a recipe for confusion, and only adds to the cognitive load in reading the code. The examples elsewhere in this thread of mistakes and noisy diffs due to leaving out the braces are plenty of reason for me to always include braces. The current code uses a mixture of styles, but that doesn't mean we need to allow any style in the future. I'm in favor of PEP 7 being amended to either require or strongly favor the braces-always style. Note: while we're reading the tea-leaves in PEP 7, it has an example of a single-line if clause with no braces. Some people favor the braces-sometimes style because it leads to "lighter" code. I think that's a misguided optimization. Consistency is better than reducing the line count. --Ned. > Cheers, > Nick. > From stephen at xemacs.org Mon Jan 2 15:32:19 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Jan 2012 23:32:19 +0900 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFF1938.6080809@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> <4EFF1938.6080809@cheimes.de> Message-ID: <87lipq18gc.fsf@uwakimon.sk.tsukuba.ac.jp> Christian Heimes writes: > Am 31.12.2011 13:03, schrieb Stephen J. Turnbull: > > I don't know the implementation issues well enough to claim it is a > > solution, but this hasn't been mentioned before AFAICS: > > > > While the dictionary probe has to start with a hash for backward > > compatibility reasons, is there a reason the overflow strategy for > > insertion has to be buckets containing lists? How about > > double-hashing, etc? > > Python's dict implementation doesn't use bucket but open addressing (aka > closed hashed table). The algorithm for conflict resolution doesn't use > double hashing. Instead it takes the original and (in most cases) cached > hash and perturbs the hash with a series of add, multiply and bit shift ops. In an attack, this is still O(collisions) per probe (as any scheme where the address of the nth collision is a function of only the hash), where double hashing should be "roughly" O(1) (with double the coefficient). But that evidently imposes too large a performance burden on non-evil users, so it's not worth thinking about whether "roughly O(1)" is close enough to O(1) to deter or exhaust attackers. I withdraw the suggestion. From solipsis at pitrou.net Mon Jan 2 15:41:44 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 15:41:44 +0100 Subject: [Python-Dev] Code reviews References: Message-ID: <20120102154144.2241e880@pitrou.net> On Mon, 2 Jan 2012 14:44:49 +1000 Nick Coghlan wrote: > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). Oh, by the way, this is also why I avoid arguing too much about style in code reviews. There are two bad things which can happen: - your advice conflicts with advice given by another reviewer (perhaps on another issue) - the contributor feels drowned under tiresome requests for style fixes ("please indent continuation lines this way") Both are potentially demotivating. A contributor can have his/her own style if it doesn't adversely affect code quality. Regards Antoine. From benjamin at python.org Mon Jan 2 15:54:02 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 08:54:02 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: 2012/1/1 Nick Coghlan : > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: Python's C code has been dropping braces long before I ever arrived. See this beautiful example in dictobject.c, for example: if (numfree < PyDict_MAXFREELIST && Py_TYPE(mp) == &PyDict_Type) free_list[numfree++] = mp; else Py_TYPE(mp)->tp_free((PyObject *)mp); There's even things like this: if (ep->me_key == dummy) freeslot = ep; else { if (ep->me_hash == hash && unicode_eq(ep->me_key, key)) return ep; freeslot = NULL; } where I would normally put braces on both statements. I think claims of its maintainability are exaggerated. (If someone could cite an example of a bug caused by braces, I'd be interested to see it.) If I start editing one of the bodies, emacs will dedent, so that I know I'm back to the containing block. By virtue of being 5 lines long, it's a very easy case to see and fix as you edit it. I think it's fine Nick raised this. PEP 7 is not very explicit about braces at all. -- Regards, Benjamin From solipsis at pitrou.net Mon Jan 2 16:04:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 16:04:58 +0100 Subject: [Python-Dev] cpython: fix some possible refleaks from PyUnicode_READY error conditions References: Message-ID: <20120102160458.58d4c207@pitrou.net> On Mon, 02 Jan 2012 16:00:50 +0100 benjamin.peterson wrote: > http://hg.python.org/cpython/rev/d5cda62d0f8c > changeset: 74236:d5cda62d0f8c > user: Benjamin Peterson > date: Mon Jan 02 09:00:30 2012 -0600 > summary: > fix some possible refleaks from PyUnicode_READY error conditions > > files: > Objects/unicodeobject.c | 80 ++++++++++++++++++++-------- > 1 files changed, 56 insertions(+), 24 deletions(-) > > > diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c > --- a/Objects/unicodeobject.c > +++ b/Objects/unicodeobject.c > @@ -9132,10 +9132,15 @@ > Py_ssize_t len1, len2; > > str_obj = PyUnicode_FromObject(str); > - if (!str_obj || PyUnicode_READY(str_obj) == -1) > + if (!str_obj) > return -1; > sub_obj = PyUnicode_FromObject(substr); > - if (!sub_obj || PyUnicode_READY(sub_obj) == -1) { > + if (!sub_obj) { > + Py_DECREF(str_obj); > + return -1; > + } > + if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) { Shouldn't the first one be PyUnicode_READY(sub_obj) ? From benjamin at python.org Mon Jan 2 16:07:54 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 09:07:54 -0600 Subject: [Python-Dev] cpython: fix some possible refleaks from PyUnicode_READY error conditions In-Reply-To: <20120102160458.58d4c207@pitrou.net> References: <20120102160458.58d4c207@pitrou.net> Message-ID: 2012/1/2 Antoine Pitrou : > On Mon, 02 Jan 2012 16:00:50 +0100 > benjamin.peterson wrote: >> http://hg.python.org/cpython/rev/d5cda62d0f8c >> changeset: ? 74236:d5cda62d0f8c >> user: ? ? ? ?Benjamin Peterson >> date: ? ? ? ?Mon Jan 02 09:00:30 2012 -0600 >> summary: >> ? fix some possible refleaks from PyUnicode_READY error conditions >> >> files: >> ? Objects/unicodeobject.c | ?80 ++++++++++++++++++++-------- >> ? 1 files changed, 56 insertions(+), 24 deletions(-) >> >> >> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c >> --- a/Objects/unicodeobject.c >> +++ b/Objects/unicodeobject.c >> @@ -9132,10 +9132,15 @@ >> ? ? ?Py_ssize_t len1, len2; >> >> ? ? ?str_obj = PyUnicode_FromObject(str); >> - ? ?if (!str_obj || PyUnicode_READY(str_obj) == -1) >> + ? ?if (!str_obj) >> ? ? ? ? ?return -1; >> ? ? ?sub_obj = PyUnicode_FromObject(substr); >> - ? ?if (!sub_obj || PyUnicode_READY(sub_obj) == -1) { >> + ? ?if (!sub_obj) { >> + ? ? ? ?Py_DECREF(str_obj); >> + ? ? ? ?return -1; >> + ? ?} >> + ? ?if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) { > > Shouldn't the first one be PyUnicode_READY(sub_obj) ? Yes. -- Regards, Benjamin From lists at cheimes.de Mon Jan 2 16:18:41 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 16:18:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: <4F01CAD1.6030206@cheimes.de> Am 02.01.2012 06:55, schrieb Paul McMillan: > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. > > For the record, cryptographically strong hash functions are in the > neighborhood of 400% slower than our existing hash function. I've pushed a new patch http://hg.python.org/features/randomhash/rev/0a65d2462e0c The changeset adds the murmur3 hash algorithm with some minor changes, for example more random seeds. At first I was worried that murmur might be slower than our old hash algorithm. But in fact it seems to be faster! Pybench 10 rounds on my Core2 Duo 2.60: py3k: 3.230 sec randomahash: 3.182 sec Christian From ajm at flonidan.dk Mon Jan 2 16:34:00 2012 From: ajm at flonidan.dk (Anders J. Munch) Date: Mon, 2 Jan 2012 16:34:00 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk><0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net><4F008B17.8000606@cheimes.de><1325436844.3472.6.camel@localhost.localdomain><4F0097B3.1040905@cheimes.de> Message-ID: <9B1795C95533CA46A83BA1EAD4B01030DFDE77@flonidanmail.flonidan.net> > On 1/1/2012 12:28 PM, Christian Heimes wrote: > I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as > saying that they consider that using a random non-zero start value is > sufficient to make the hash non-vulnerable. Sufficient against their current attack. But will it last? For a long-running server, there must be plenty of ways information can leak that will help guessing that start value. The alternative, to provide a dict-like datastructure for use with untrusted input, deserves consideration. Perhaps something simpler than a balanced tree would do? How about a dict-like class that is built on a lazily sorted list? Insertions basically just do list.append and set a dirty-flag, and lookups use bisect - sorting first if the dirty-flag is set. It wouldn't be complete dict replacement by any means, mixing insertions and lookups would have terrible performance, but for something like POST parameters it should be good enough. I half expected to find something like that on activestate recipes already, but couldn't find any. regards, Anders From lists at cheimes.de Mon Jan 2 16:47:43 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 16:47:43 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F01D19F.3090009@cheimes.de> Am 01.01.2012 19:45, schrieb Terry Reedy: > On 1/1/2012 10:13 AM, Guido van Rossum wrote: >> PS. Is the collision-generator used in the attack code open source? > > As I posted before, Alexander Klink and Julian W?lde gave their project > email as hashDoS at alech.de. Since they indicated disappointment in not > hearing from Python, I presume they would welcome engagement. Somebody should contact Alexander and Julian to let them know, that we are working on the matter. It should be somebody "official" for the initial contact, too. I've included Guido (BDFL), Barry (their initial security contact) and MvL (most prominent German core dev) in CC, as they are the logical choice for me. I'm willing to have a phone call with them once the contact has been established. IMHO it's slightly easier to talk in native tongue -- Alexander and Julian are German, too. Christian From g.brandl at gmx.net Mon Jan 2 18:35:48 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 02 Jan 2012 18:35:48 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: <20120102154144.2241e880@pitrou.net> References: <20120102154144.2241e880@pitrou.net> Message-ID: On 01/02/2012 03:41 PM, Antoine Pitrou wrote: > On Mon, 2 Jan 2012 14:44:49 +1000 > Nick Coghlan wrote: >> >> He keeps leaving them out, I occasionally tell him they should always >> be included (most recently this came up when we gave conflicting >> advice to a patch contributor). > > Oh, by the way, this is also why I avoid arguing too much about style > in code reviews. There are two bad things which can happen: > > - your advice conflicts with advice given by another reviewer (perhaps > on another issue) > - the contributor feels drowned under tiresome requests for style > fixes ("please indent continuation lines this way") > > Both are potentially demotivating. A contributor can have his/her own > style if it doesn't adversely affect code quality. Exactly. Especially for reviews of patches from non-core people, we should exercise a lot of restraint: as the committers, I think we can be expected to bite the sour bullet and apply our uniform style (such as it is). It is tiresome, if not downright disappointing, to get reviews that are basically "nothing wrong, but please submit again with one more empty line between the classes", and definitely not the way to attract more contributors. Georg From g.brandl at gmx.net Mon Jan 2 18:38:41 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 02 Jan 2012 18:38:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01D19F.3090009@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: On 01/02/2012 04:47 PM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: >> On 1/1/2012 10:13 AM, Guido van Rossum wrote: >>> PS. Is the collision-generator used in the attack code open source? >> >> As I posted before, Alexander Klink and Julian W?lde gave their project >> email as hashDoS at alech.de. Since they indicated disappointment in not >> hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. I wouldn't expect too much -- they seem rather keen on cheap laughs: http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Georg From guido at python.org Mon Jan 2 19:29:29 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 10:29:29 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01D19F.3090009@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 7:47 AM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: > > On 1/1/2012 10:13 AM, Guido van Rossum wrote: > >> PS. Is the collision-generator used in the attack code open source? > > > > As I posted before, Alexander Klink and Julian W?lde gave their project > > email as hashDoS at alech.de. Since they indicated disappointment in not > > hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. > I'm not sure I see the point -- just give them a link to the python-dev archives. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Mon Jan 2 19:26:13 2012 From: francismb at email.de (francis) Date: Mon, 02 Jan 2012 19:26:13 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> Message-ID: <4F01F6C5.2020205@email.de> On 01/02/2012 06:35 PM, Georg Brandl wrote: > On 01/02/2012 03:41 PM, Antoine Pitrou wrote: >> On Mon, 2 Jan 2012 14:44:49 +1000 >> Nick Coghlan wrote: >>> He keeps leaving them out, I occasionally tell him they should always >>> be included (most recently this came up when we gave conflicting >>> advice to a patch contributor). >> Oh, by the way, this is also why I avoid arguing too much about style >> in code reviews. There are two bad things which can happen: >> >> - your advice conflicts with advice given by another reviewer (perhaps >> on another issue) >> - the contributor feels drowned under tiresome requests for style >> fixes ("please indent continuation lines this way") >> >> Both are potentially demotivating. A contributor can have his/her own >> style if it doesn't adversely affect code quality. > Exactly. Especially for reviews of patches from non-core people, we > should exercise a lot of restraint: as the committers, I think we can be > expected to bite the sour bullet and apply our uniform style (such as > it is). > > It is tiresome, if not downright disappointing, to get reviews that > are basically "nothing wrong, but please submit again with one more > empty line between the classes", and definitely not the way to > attract more contributors. > Hi to all member of this list, I'm not a Python-Dev (only some very small patches over core-mentorship list. Just my 2cents here). I would try to relax this conflicts with a script that does the reformatting itself. If that reformatting where part of the process itself do you thing that that would be an issue anymore? PS: I know that there?s a pep8 checker so it could be transformed into a reformatter but I don't know if theres a pep7 checker (reformater) Best regards! francis From brian at python.org Mon Jan 2 19:41:07 2012 From: brian at python.org (Brian Curtin) Date: Mon, 2 Jan 2012 12:41:07 -0600 Subject: [Python-Dev] Code reviews In-Reply-To: <4F01F6C5.2020205@email.de> References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: On Mon, Jan 2, 2012 at 12:26, francis wrote: > On 01/02/2012 06:35 PM, Georg Brandl wrote: >> >> On 01/02/2012 03:41 PM, Antoine Pitrou wrote: >>> >>> On Mon, 2 Jan 2012 14:44:49 +1000 >>> Nick Coghlan ?wrote: >>>> >>>> He keeps leaving them out, I occasionally tell him they should always >>>> be included (most recently this came up when we gave conflicting >>>> advice to a patch contributor). >>> >>> Oh, by the way, this is also why I avoid arguing too much about style >>> in code reviews. There are two bad things which can happen: >>> >>> - your advice conflicts with advice given by another reviewer (perhaps >>> ? on another issue) >>> - the contributor feels drowned under tiresome requests for style >>> ? fixes ("please indent continuation lines this way") >>> >>> Both are potentially demotivating. A contributor can have his/her own >>> style if it doesn't adversely affect code quality. >> >> Exactly. Especially for reviews of patches from non-core people, we >> should exercise a lot of restraint: as the committers, I think we can be >> expected to bite the sour bullet and apply our uniform style (such as >> it is). >> >> It is tiresome, if not downright disappointing, to get reviews that >> are basically "nothing wrong, but please submit again with one more >> empty line between the classes", and definitely not the way to >> attract more contributors. >> > Hi to all member of this list, > I'm not a Python-Dev (only some very small patches over core-mentorship > list. > Just my 2cents here). > > I would try to relax this conflicts with a script that does the reformatting > itself. If > that reformatting where part of the process itself do you thing that that > would > be an issue anymore? I don't think this is a problem to the point that it needs to be fixed via automation. The code I write is the code I build and test, so I'd rather not have some script that goes in and modifies it to some accepted format, then have to go through the build/test dance again. From snaury at gmail.com Mon Jan 2 19:53:09 2012 From: snaury at gmail.com (Alexey Borzenkov) Date: Mon, 2 Jan 2012 22:53:09 +0400 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01CAD1.6030206@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> <4F01CAD1.6030206@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: > Am 02.01.2012 06:55, schrieb Paul McMillan: >> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >> this. FNV is otherwise similar to our existing hash function. >> >> For the record, cryptographically strong hash functions are in the >> neighborhood of 400% slower than our existing hash function. > > I've pushed a new patch > http://hg.python.org/features/randomhash/rev/0a65d2462e0c It seems for 32-bit version you are using pid for the two constants. Also, it's unclear why you even need to use a random constant for the final pass, you already use random constant as an initial h1, and it should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, k4 should be initialized to zero, these are key data, they don't need to be mixed with anything. Also, I'm not sure how portable is the always_inline attribute, is it supported on all compilers and all platforms? From snaury at gmail.com Mon Jan 2 19:57:27 2012 From: snaury at gmail.com (Alexey Borzenkov) Date: Mon, 2 Jan 2012 22:57:27 +0400 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> <4F01CAD1.6030206@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 10:53 PM, Alexey Borzenkov wrote: > On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: >> Am 02.01.2012 06:55, schrieb Paul McMillan: >>> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >>> this. FNV is otherwise similar to our existing hash function. >>> >>> For the record, cryptographically strong hash functions are in the >>> neighborhood of 400% slower than our existing hash function. >> >> I've pushed a new patch >> http://hg.python.org/features/randomhash/rev/0a65d2462e0c > > It seems for 32-bit version you are using pid for the two constants. > Also, it's unclear why you even need to use a random constant for the > final pass, you already use random constant as an initial h1, and it > should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, > k4 should be initialized to zero, these are key data, they don't need > to be mixed with anything. Sorry, sent too soon. What I mean is that you're initializing a pretty big array of values when you only need a 32-bit value. Pid, in my opinion might be too predictable, it would be a lot better to simply hash pid and gettimeofday bytes to produce this single 32-bit value and use it for h1, h2, h3 and h4 in both 32-bit and 128-bit versions. > Also, I'm not sure how portable is the always_inline attribute, is it > supported on all compilers and all platforms? From tseaver at palladion.com Mon Jan 2 21:25:00 2012 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 02 Jan 2012 15:25:00 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F013ACF.7090708@scottdial.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/02/2012 01:02 AM, Nick Coghlan wrote: > On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial > wrote: >> On 1/1/2012 11:44 PM, Nick Coghlan wrote: >>> I think it's a recipe for future maintenance hassles when someone >>> adds a second statement to one of the clauses but doesn't add the >>> braces. (The only time I consider it reasonable to leave out the >>> braces is for one liner if statements, where there's no else >>> clause at all) >> >> Could you explain how these two cases differ with regard to >> maintenance? > > Sure: always including K&R style braces for compound statements (even > when they aren't technically necessary) means that indentation == > control flow, just like Python. Indent your additions correctly, and > the reader and compiler will agree on what they mean: > > if (cond) { statement; } else { statement; addition; /* Reader and > compiler agree this is part of the else clause */ } > > if (cond) statement; else statement; addition; /* Uh-oh, should have > added braces */ > > I've been trying to convince Benjamin that there's a reason "always > include the braces" is accepted wisdom amongst many veteran C > programmers (with some allowing an exception for one-liners), but he > isn't believing me, and I'm not going to go through and edit every > single one of his commits to add them. FWIW, +1 to mandating braces-always (even for one liners): the future maintenance burden isn't worth the trouble of the exception. In the days when I did C / C++ / Java coding as my main gig, braceless code was routinely a bug magnet *for the team*. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8CEpwACgkQ+gerLs4ltQ5vTwCbBjlToJ2yZh4Ra+tNkqMVIaLj NfUAnjAfkDE0BPus1g33hd84tkGonUzd =K1p9 -----END PGP SIGNATURE----- From benjamin at python.org Mon Jan 2 21:31:57 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 14:31:57 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: 2012/1/1 Nick Coghlan : > > ?if (cond) { > ? ?statement; > ?} else { > ? ?statement; > ?} I might add that assuming you have braces, PEP 7 would want you to format it as if (cond) { statement; } else { more_stuff; } -- Regards, Benjamin From julien at tayon.net Mon Jan 2 22:02:28 2012 From: julien at tayon.net (julien tayon) Date: Mon, 2 Jan 2012 22:02:28 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: @francis Like indent ? http://www.linuxmanpages.com/man1/indent.1.php @brian > I don't think this is a problem to the point that it needs to be fixed > via automation. The code I write is the code I build and test, so I'd > rather not have some script that goes in and modifies it to some > accepted format, then have to go through the build/test dance again. Well, it breaks committing since it adds non significative symbols, therefore bloats the diffs. But as far as I am concerned for using it a long time ago, it did not break anything, it was pretty reliable. my 2c * 0.1 -- jul From jimjjewett at gmail.com Mon Jan 2 22:07:59 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 2 Jan 2012 16:07:59 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby wrote: > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett wrote: >> >> Given the wording requiring a real dictionary, I would have assumed >> that it was OK (if perhaps not sensible) to do pointer arithmetic and >> access the keys/values/hashes directly. ?(Though if the breakage was >> between python versions, I would feel guilty about griping too >> loudly.) > If you're going to be a language lawyer about it, I would simply point out > that all the spec requires is that "type(env) is dict" -- it says nothing > about how Python defines "type" or "is" or "dict". ?So, you're on your own > with that one. ;-) But the public header file < http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > defines the typedef structs for PyDictEntry and _dictobject. What is the purpose of the requiring a "real dict" without also promising what the header file promises? -jJ From larry at hastings.org Mon Jan 2 22:50:32 2012 From: larry at hastings.org (Larry Hastings) Date: Mon, 02 Jan 2012 13:50:32 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> Message-ID: <4F0226A8.7020006@hastings.org> On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > Really? Do we need to have a brace war? > People have different preferences. > The standard library includes some of both styles > depending on what the maintainer thought was cleanest to their eyes in a given context. I'm with Raymond. Code should be readable, and code reviews are the best way to achieve that--not endlessly specific formatting rules. Have there been bugs in CPython that the proposed new PEP 7 rule would have prevented? /arry From guido at python.org Mon Jan 2 23:08:17 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 14:08:17 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F0226A8.7020006@hastings.org> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On Mon, Jan 2, 2012 at 1:50 PM, Larry Hastings wrote: > On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > >> Really? Do we need to have a brace war? >> People have different preferences. >> The standard library includes some of both styles >> depending on what the maintainer thought was cleanest to their eyes in a >> given context. >> > > I'm with Raymond. Code should be readable, and code reviews are the best > way to achieve that--not endlessly specific formatting rules. > > Have there been bugs in CPython that the proposed new PEP 7 rule would > have prevented? The irony is that style guides exist to *avoid* debates like this. Yes, the choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the rules. But still, once a style rule has been set, the idea is to stop debating and just code. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Mon Jan 2 23:09:49 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 3 Jan 2012 09:09:49 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F0226A8.7020006@hastings.org> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On 3 January 2012 08:50, Larry Hastings wrote: > On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > >> Really? Do we need to have a brace war? >> People have different preferences. >> The standard library includes some of both styles >> depending on what the maintainer thought was cleanest to their eyes in a >> given context. >> > > I'm with Raymond. Code should be readable, and code reviews are the best > way to achieve that--not endlessly specific formatting rules. > > Have there been bugs in CPython that the proposed new PEP 7 rule would > have prevented? > I've found that until someone has experienced multiple nasty bugs caused by not always using braces, it's nearly impossible to convince them of why you should. Afterwards it'simpossible to convince them (me) that you shouldn't always use braces. I'd also point out that if you're expecting braces, not having them can make the code less readable. A consistent format tends to make for more readable code. Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From francisco.martin at web.de Mon Jan 2 23:13:27 2012 From: francisco.martin at web.de (Francisco Martin Brugue) Date: Mon, 02 Jan 2012 23:13:27 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: <4F022C07.9060800@web.de> On 01/02/2012 10:02 PM, julien tayon wrote: > @francis > Like indent ? > http://www.linuxmanpages.com/man1/indent.1.php Thank you, I wasn't aware of this one ! From raymond.hettinger at gmail.com Mon Jan 2 23:32:14 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 14:32:14 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Jan 2, 2012, at 12:31 PM, Benjamin Peterson wrote: > I might add that assuming you have braces, PEP 7 would want you to format it as > > if (cond) { > statement; > } > else { > more_stuff; > } > Running ``grep -B1 else Objects/*c`` shows that we've happily lived with a mixture of styles for a very long time. ISTM, our committers have had good instincts about when braces add clarity and when they add clutter. If Nick pushes through an always-use-braces mandate, A LOT of code will need to be changed. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jan 2 23:55:59 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 14:55:59 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote: > I'd also point out that if you're expecting braces, not having them can make the code less readable. If a programmer's mind explodes when they look at the simple and beautiful examples in K&R's The C Programming Language, then they've got problems that can't be solved by braces ;-) Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Jan 3 00:08:15 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 02 Jan 2012 18:08:15 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F0238DF.2010701@nedbatchelder.com> On 1/2/2012 5:32 PM, Raymond Hettinger wrote: > > Running ``grep -B1 else Objects/*c`` shows that we've happily lived > with a mixture of styles for a very long time. > ISTM, our committers have had good instincts about when braces add > clarity and when they add clutter. > If Nick pushes through an always-use-braces mandate, A LOT of code > will need to be changed. > I'm sure we can agree that 1) Nick isn't "pushing through" anything, this is a discussion about what to do, and 2) even if we agree to change PEP 7, no one would advocate having to go through all the C code to change it to a newly-agreed style. --Ned. > > Raymond > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Tue Jan 3 01:16:15 2012 From: pje at telecommunity.com (PJ Eby) Date: Mon, 2 Jan 2012 19:16:15 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett wrote: > On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby wrote: > > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett > wrote: > >> > >> Given the wording requiring a real dictionary, I would have assumed > >> that it was OK (if perhaps not sensible) to do pointer arithmetic and > >> access the keys/values/hashes directly. (Though if the breakage was > >> between python versions, I would feel guilty about griping too > >> loudly.) > > > If you're going to be a language lawyer about it, I would simply point > out > > that all the spec requires is that "type(env) is dict" -- it says nothing > > about how Python defines "type" or "is" or "dict". So, you're on your > own > > with that one. ;-) > > But the public header file < > http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > > defines the typedef structs for PyDictEntry and _dictobject. > > What is the purpose of the requiring a "real dict" without also > promising what the header file promises? > > Er, just because it's in the .h doesn't mean it's in the public API. But in any event, if you're actually serious about this, I'd just point out that: 1. The struct layout doesn't guarantee anything about insertion or lookup algorithms, 2. If the data structure were changed, the header file would obviously change as well, and 3. ISTM that Python does not even promise inter-version ABI compatibility for internals like the dict object layout. Are you seriously writing code that relies on the C structure layout of dicts? Because really, that was SO not the point of the dict type requirement. It was so that you could use Python's low-level *API* calls, not muck about with the data structure directly. I'm occasionally considered notorious for abusing Python internals, but even I have to draw the line somewhere. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 3 01:22:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2012 10:22:28 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 12:54 AM, Benjamin Peterson wrote: > I think it's fine Nick raised this. PEP 7 is not very explicit about > braces at all. I actually discovered in this thread that I've been misreading PEP 7 for going on 7 years now - I thought the brace usage example *did* use "} else {" (mainly because I write my if statements that way, and nobody had ever pointed out to me that the C style guide actually says otherwise). So I'm happy enough with leaving PEP 7 alone and letting the stylistic inconsistencies stand (even going forward). I agree in these days of auto-indenting editors and automated test suites, the maintenance benefits of always requiring the braces are significantly less than they used to be. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Jan 3 01:27:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2012 10:27:11 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger wrote: > Running ?``grep -B1 else Objects/*c`` shows that we've happily lived with a > mixture of styles for a very long time. > ISTM, our committers have had good instincts about when braces add clarity > and when they add clutter. > If Nick pushes through an always-use-braces mandate, A LOT of code will need > to be changed. Nah, I was asking a genuine question, not pushing anything in particular. I *thought* the code base was more consistent than it is, but it turns out that was an error of perception on my part, rather than an objective fact. With my perception of the status quo corrected, I can stop worrying about preserving a non-existent consistency. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Tue Jan 3 01:47:48 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 16:47:48 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <3FD8D3FB-ABD9-490C-ACC5-C8927EC39843@gmail.com> On Jan 2, 2012, at 4:27 PM, Nick Coghlan wrote: > With my perception of the status quo corrected, I can stop worrying > about preserving a non-existent consistency. +1 QOTD Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Tue Jan 3 01:53:06 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 3 Jan 2012 11:53:06 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On 3 January 2012 09:55, Raymond Hettinger wrote: > > On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote: > > I'd also point out that if you're expecting braces, not having them can > make the code less readable. > > > If a programmer's mind explodes when they look at the simple and beautiful > examples in K&R's The C Programming Language, then they've got problems > that can't be solved by braces ;-) > Now that's just hyperbole ;) If you've got a mix of braces and non-braces in a chunk of code, it's very easy for the mind to skip over the non-brace blocks as not being blocks. I know it's not something I'm likely to mess up when reading the code in-depth, but if I'm skimming over trying to understand the gist of the code or looking for what should be an obvious bug, a block that's not brace-delimited is more likely to be missed than one that is (when amongst other blocks that are). If we had the option of "just use indentation" in C I'd advocate that. Failing that, I find that consistent usage of braces is preferable. Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 3 01:54:42 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 16:54:42 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 4:27 PM, Nick Coghlan wrote: > On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger > wrote: > > Running ``grep -B1 else Objects/*c`` shows that we've happily lived > with a > > mixture of styles for a very long time. > > ISTM, our committers have had good instincts about when braces add > clarity > > and when they add clutter. > > If Nick pushes through an always-use-braces mandate, A LOT of code will > need > > to be changed. > > Nah, I was asking a genuine question, not pushing anything in > particular. I *thought* the code base was more consistent than it is, > but it turns out that was an error of perception on my part, rather > than an objective fact. > > With my perception of the status quo corrected, I can stop worrying > about preserving a non-existent consistency. > Amen. And, as the (nominal) author of the PEP, the PEP didn't mean to state an opinion on whether braces are mandatory. It only meant to state how they should be placed when they are there. It's true that there are other readings possible, but that's what I meant. If someone wants to change the wording to clarify this, go right ahead. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Jan 3 04:20:09 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:20:09 -0500 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> Message-ID: <20120102222009.0e87431d@limelight.wooz.org> On Jan 02, 2012, at 06:35 PM, Georg Brandl wrote: >Exactly. Especially for reviews of patches from non-core people, we >should exercise a lot of restraint: as the committers, I think we can be >expected to bite the sour bullet and apply our uniform style (such as >it is). > >It is tiresome, if not downright disappointing, to get reviews that >are basically "nothing wrong, but please submit again with one more >empty line between the classes", and definitely not the way to >attract more contributors. I think it's fine in a code review to point out where the submission misses the important consistency points, but not to hold up merging the changes because of that. You want to educate and motivate so that the next submission comes closer to our standards. The core dev who commits the change can clean up style issues. -Barry P.S. +1 for the change to PEP 7. From barry at python.org Tue Jan 3 04:25:55 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:25:55 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: <20120102222555.15d0a42c@limelight.wooz.org> On Jan 02, 2012, at 02:08 PM, Guido van Rossum wrote: >The irony is that style guides exist to *avoid* debates like this. Yes, the >choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the >rules. But still, once a style rule has been set, the idea is to stop >debating and just code. +1 The other reason why style guides exist is to give contributors some sense of what they should shoot for. I've worked on existing code bases where there's so little consistency I can't tell what the author's preferences are even if I wanted to adhere to them. -Barry From barry at python.org Tue Jan 3 04:36:01 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:36:01 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: <20120102223601.5cc6e1dc@limelight.wooz.org> On Jan 02, 2012, at 06:38 PM, Georg Brandl wrote: >I wouldn't expect too much -- they seem rather keen on cheap laughs: > >http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Heh, so yeah, it won't be me contacting them. -Barry From martin at v.loewis.de Tue Jan 3 09:44:03 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 03 Jan 2012 09:44:03 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F02BFD3.7080204@v.loewis.de> > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) While this appears to be settled, I'd like to add that I sided with Benjamin here all along. With Python, I accepted a style of "minimal punctuation". Examples of extra punctuation are: - parens around expression in Python's if (and while): if (x < 10): foo () - parens around return expression (C and Python) return(*p); - braces around single-statement blocks in C In all these cases, punctuation can be left out without changing the meaning of the program. I personally think that a policy requiring braces would be (mildly) harmful, as it decreases readability of the code. When I read code, I read every character: not just the identifiers, but also every punctuation character. If there is extra punctuation, I stop and wonder what the motivation for the punctuation is - is there any hidden meaning that required the author to put the punctuation? There is a single case where I can accept extra punctuation in C: to make the operator precedence explicit. Many people (including myself) don't know how a | b << *c * *d would group, so I readily accept extra parens as a clarification. Wrt. braces, I don't share the concern that there is a risk of somebody being confused when adding a second statement to a braceless block. An actual risk is stuff like if (cond) MACRO(argument); when MACRO expands to multiple statements. However, we should accept that this is a bug in MACRO (which should have used the do-while(0)-idiom), not in the application of the macro. Regards, Martin From lists at cheimes.de Tue Jan 3 14:18:34 2012 From: lists at cheimes.de (Christian Heimes) Date: Tue, 03 Jan 2012 14:18:34 +0100 Subject: [Python-Dev] RNG in the core Message-ID: <4F03002A.5010800@cheimes.de> Hello, all proposed fixes for a randomized hashing function raise and fall with a good random number generator to feed the random seed. The seed must be created very early in the startup phase of the interpreter, preferable before the basic types are initialized. CPython already have multiple sources for random data (win32_urandom in Modules/posixmodule.c, urandom in Lib/os.py, Mersenne twister in Modules/_randommodule.c). However we can't use them because they are wrapped inside Python modules which require infrastructure like initialized base types. I propose an addition to the current Python C API: int PyOS_URandom(char *buf, Py_ssize_t len) Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". The RNG should be suitable for cryptography. In case of an error the function returns -1 and sets an exception, otherwise it returns 0. On Windows I can re-use most of the code of win32_urandom(). For POSIX I have to implement os.urandom() in C in order to read data from /dev/urandom. That's simple and straight forward. Since some platforms may not have /dev/urandom, we need a PRNG in the core, too. I therefore propose to move the Mersenne twister from randommodule.c into the core, too. typedef struct { unsigned long state[N]; int index; } _Py_MT_RandomState; unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); // genrand_int32() double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random() void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed); // init_genrand() void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long init_key[], unsigned long key_length); // init_by_array I suggest Python/random.c as source file and Python/pyrandom.h as header file. Comments? Christian From anacrolix at gmail.com Tue Jan 3 15:46:51 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 4 Jan 2012 01:46:51 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F02BFD3.7080204@v.loewis.de> References: <4F02BFD3.7080204@v.loewis.de> Message-ID: FWIW I'm against forcing braces to be used. Readability is the highest concern, and this should be at the discretion of the contributor. A code formatting tool, or compiler extension is the only proper handle this, and neither are in use or available. On Tue, Jan 3, 2012 at 7:44 PM, "Martin v. L?wis" wrote: >> He keeps leaving them out, I occasionally tell him they should always >> be included (most recently this came up when we gave conflicting >> advice to a patch contributor). He says what he's doing is OK, because >> he doesn't consider the example in PEP 7 as explicitly disallowing it, >> I think it's a recipe for future maintenance hassles when someone adds >> a second statement to one of the clauses but doesn't add the braces. >> (The only time I consider it reasonable to leave out the braces is for >> one liner if statements, where there's no else clause at all) > > While this appears to be settled, I'd like to add that I sided with > Benjamin here all along. > > With Python, I accepted a style of "minimal punctuation". Examples > of extra punctuation are: > - parens around expression in Python's if (and while): > > ? ?if (x < 10): > ? ? ?foo () > > - parens around return expression (C and Python) > > ? ?return(*p); > > - braces around single-statement blocks in C > > In all these cases, punctuation can be left out without changing > the meaning of the program. > > I personally think that a policy requiring braces would be (mildly) > harmful, as it decreases readability of the code. When I read code, > I read every character: not just the identifiers, but also every > punctuation character. If there is extra punctuation, I stop and wonder > what the motivation for the punctuation is - is there any hidden > meaning that required the author to put the punctuation? > > There is a single case where I can accept extra punctuation in C: > to make the operator precedence explicit. Many people (including > myself) don't know how > > ? a | b << *c * *d > > would group, so I readily accept extra parens as a clarification. > > Wrt. braces, I don't share the concern that there is a risk of > somebody being confused when adding a second statement to a braceless > block. An actual risk is stuff like > > ? if (cond) > ? ? MACRO(argument); > > when MACRO expands to multiple statements. However, we should > accept that this is a bug in MACRO (which should have used the > do-while(0)-idiom), not in the application of the macro. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From stephen at xemacs.org Tue Jan 3 16:46:22 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 00:46:22 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> Message-ID: <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Matt Joiner writes: > Readability is the highest concern, and this should be at the > discretion of the contributor. That's quite backwards. "Readability" is community property, and has as much, if not more, to do with common convention as with some absolute metric. The "contributor's discretion" must yield. That doesn't mean the contributor has to do all the work; as several people have pointed out, it makes a lot of sense for experienced reviewers to make such trivial changes themselves before committing, especially for new contributors. From matthieu.brucher at gmail.com Tue Jan 3 18:23:08 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 3 Jan 2012 18:23:08 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: Hi, I'm not a core Python developer, but it may be intesting to use a real Crush resistant RNG, as one from Random123 (a parallel random generator that is Crush resistant, contrary to the Mersenne Twister, and without a state). Cheers, Matthieu Brucher 2012/1/3 Christian Heimes > Hello, > > all proposed fixes for a randomized hashing function raise and fall with > a good random number generator to feed the random seed. The seed must be > created very early in the startup phase of the interpreter, preferable > before the basic types are initialized. CPython already have multiple > sources for random data (win32_urandom in Modules/posixmodule.c, urandom > in Lib/os.py, Mersenne twister in Modules/_randommodule.c). However we > can't use them because they are wrapped inside Python modules which > require infrastructure like initialized base types. > > I propose an addition to the current Python C API: > > int PyOS_URandom(char *buf, Py_ssize_t len) > > Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". > The RNG should be suitable for cryptography. In case of an error the > function returns -1 and sets an exception, otherwise it returns 0. > On Windows I can re-use most of the code of win32_urandom(). For POSIX I > have to implement os.urandom() in C in order to read data from > /dev/urandom. That's simple and straight forward. > > > Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too. > > typedef struct { > unsigned long state[N]; > int index; > } _Py_MT_RandomState; > > unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); // > genrand_int32() > double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random() > void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed); > // init_genrand() > void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long > init_key[], unsigned long key_length); // init_by_array > > > I suggest Python/random.c as source file and Python/pyrandom.h as header > file. Comments? > > Christian > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 3 18:46:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Jan 2012 18:46:05 +0100 Subject: [Python-Dev] RNG in the core References: <4F03002A.5010800@cheimes.de> Message-ID: <20120103184605.1417f035@pitrou.net> On Tue, 03 Jan 2012 14:18:34 +0100 Christian Heimes wrote: > > I suggest Python/random.c as source file and Python/pyrandom.h as header > file. Comments? Looks good on the principle. The API names for MT are a bit ugly. > The RNG should be suitable for cryptography. Sounds like too strong a requirement. For cryptography, we have the ssl module (and third-party libraries). (also, "suitable for cryptography" is somewhat vague; for example, the Linux man pages insist that /dev/urandom is ok for session keys but /dev/random is needed for long-lived private keys) Regards Antoine. From lists at cheimes.de Tue Jan 3 18:50:44 2012 From: lists at cheimes.de (Christian Heimes) Date: Tue, 03 Jan 2012 18:50:44 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: References: <4F03002A.5010800@cheimes.de> Message-ID: <4F033FF4.3010204@cheimes.de> Am 03.01.2012 18:23, schrieb Matthieu Brucher: > Hi, > > I'm not a core Python developer, but it may be intesting to use a real > Crush resistant RNG, as one from Random123 (a parallel random generator > that is Crush resistant, contrary to the Mersenne Twister, and without a > state). Hello Matthieu, thanks for your input! The core RNG is going to be part of the randomized hashing function patch. The patch will be applied to all Python version from 2.6 to 3.3. Some people may want to applied it to 2.4 and 2.5, too. As the patch is going to affect six to eight Python versions, it should introduce as few new code as possible. Any new code might be a source of new bugs. The Mersenne Twister code is mature and works sufficiently as backup. Any new RNG should go through a PEP process, too. You are welcome to write a PEP and implement an additional RNG for the random module. New developers and new ideas are well received. Regards, Christian From ethan at stoneleaf.us Tue Jan 3 19:42:43 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 03 Jan 2012 10:42:43 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F034C23.7070000@stoneleaf.us> Stephen J. Turnbull wrote: > Matt Joiner writes: > > > Readability is the highest concern, and this should be at the > > discretion of the contributor. > > That's quite backwards. "Readability" is community property, and has > as much, if not more, to do with common convention as with some > absolute metric. The "contributor's discretion" must yield. Readability also includes more than just the source code; as has already been stated: if(cond) { stmt1; + stmt2; } vs. -if(cond) +if(cond) { stmt1; + stmt2; +} I find the diff version that already had braces in place much more readable. ~Ethan~ From barry at python.org Tue Jan 3 20:38:38 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 3 Jan 2012 14:38:38 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <20120103143838.139a72f5@resist.wooz.org> On Dec 31, 2011, at 04:56 PM, Guido van Rossum wrote: >Is there a tracker issue yet? The discussion should probably move there. I think the answer to this was "no"... until now. http://bugs.python.org/issue13703 Proposed patches should be linked to this issue now. Please nosy yourself if you want to follow the progress. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From jimjjewett at gmail.com Tue Jan 3 20:55:32 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 3 Jan 2012 14:55:32 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 7:16 PM, PJ Eby wrote: > On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett wrote: >> But the public header file < >> http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > >> defines the typedef structs for PyDictEntry and _dictobject. >> What is the purpose of the requiring a "real dict" without also >> promising what the header file promises? > Er, just because it's in the .h doesn't mean it's in the public API. ?But in > any event, if you're actually serious about this, I'd just point out that: > 1. The struct layout doesn't guarantee anything about insertion or lookup > algorithms, My concern was about your suggestion of changing the data structure to accommodate some other algorithm -- particularly if it meant that the data would no longer be stored entirely in an array of PyDictEntry. That shouldn't be done lightly even between major versions, and certainly should not be done in a bugfix (or security-only) release. > Are you seriously writing code that relies on the C structure layout of > dicts? The first page of search results for PyDictEntry suggested that others are. (The code I found did seem to be for getting data from a python dict into some other language, rather than for wsgi.) > ?Because really, that was SO not the point of the dict type > requirement. ?It was so that you could use Python's low-level *API* calls, > not muck about with the data structure directly. Would it be too late to clarify that in the PEP itself? -jJ From steve at pearwood.info Tue Jan 3 21:29:10 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 04 Jan 2012 07:29:10 +1100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: <4F036516.5080701@pearwood.info> Christian Heimes wrote: [...] > I propose an addition to the current Python C API: > > int PyOS_URandom(char *buf, Py_ssize_t len) > > Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". > The RNG should be suitable for cryptography. > Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too. Mersenne twister is not suitable for cryptography. http://en.wikipedia.org/wiki/Mersenne_twister -- Steven From matthieu.brucher at gmail.com Tue Jan 3 22:00:43 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 3 Jan 2012 22:00:43 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F033FF4.3010204@cheimes.de> References: <4F03002A.5010800@cheimes.de> <4F033FF4.3010204@cheimes.de> Message-ID: > The core RNG is going to be part of the randomized hashing function > patch. The patch will be applied to all Python version from 2.6 to 3.3. > Some people may want to applied it to 2.4 and 2.5, too. As the patch is > going to affect six to eight Python versions, it should introduce as few > new code as possible. Any new code might be a source of new bugs. The > Mersenne Twister code is mature and works sufficiently as backup. > > Any new RNG should go through a PEP process, too. You are welcome to > write a PEP and implement an additional RNG for the random module. New > developers and new ideas are well received. > Good point. In fact, these RNG are 100% based on the hash functions provided for instance by OpenSSL. But I think this library is not a dependency so my proposal still has the same impact. The Random123 library is a reimplementation of some cryptographic functions with two arguments, the key and the counter, and that's it. So if there is somewhere in the Python C code such cryptographic function, it can be reused to create Crush-resistant random numbers with no new code line. Cheers, Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jan 3 22:17:06 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 3 Jan 2012 22:17:06 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: A randomized hash doesn't need cryptographic RNG (which are slow and need a lot of new code), and the new hash function should maybe not be cryptographic. We need to make the DoS more expensive for the attacker, but we don't need to add "too much security" for that. Mersenne Twister is useless here: it is only needed when you need to generate a fast RNG to generate megabytes of random data, whereas we will not need more than 4 KB. The OS RNG is just fine (fast enough and not blocking). So we can use Windows CryptoGen API (which is already implemented in Python, win32_urandom) and /dev/urandom on UNIX/BSD. /dev/urandom does never block. We need also a fallback if /dev/urandom is not available. Because this case should not occur on modern OS, the fallback can be a weak function like something combining getpid(), gettimeofday(), address of the stack, etc. To generate 4 KB from few words, we can use a very simple LCG (x(n+1) = (x(n) * a + c) mod k). From solipsis at pitrou.net Tue Jan 3 22:20:53 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Jan 2012 22:20:53 +0100 Subject: [Python-Dev] RNG in the core References: <4F03002A.5010800@cheimes.de> Message-ID: <20120103222053.2325d352@pitrou.net> On Tue, 3 Jan 2012 22:17:06 +0100 Victor Stinner wrote: > A randomized hash doesn't need cryptographic RNG (which are slow and > need a lot of new code), and the new hash function should maybe not be > cryptographic. We need to make the DoS more expensive for the > attacker, but we don't need to add "too much security" for that. Agreed. > Mersenne Twister is useless here: it is only needed when you need to > generate a fast RNG to generate megabytes of random data, whereas we > will not need more than 4 KB. The OS RNG is just fine (fast enough and > not blocking). Have you read the following sentence: ?Since some platforms may not have /dev/urandom, we need a PRNG in the core, too. I therefore propose to move the Mersenne twister from randommodule.c into the core, too.? Regards Antoine. From janssen at parc.com Tue Jan 3 23:02:19 2012 From: janssen at parc.com (Bill Janssen) Date: Tue, 3 Jan 2012 14:02:19 PST Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC68E0.4000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> Message-ID: <63988.1325628139@parc.com> Christian Heimes wrote: > Am 29.12.2011 12:13, schrieb Mark Shannon: > > The attack relies on being able to predict the hash value for a given > > string. Randomising the string hash function is quite straightforward. > > There is no need to change the dictionary code. > > > > A possible (*untested*) patch is attached. I'll leave it for those more > > familiar with unicodeobject.c to do properly. > > I'm worried that hash randomization of str is going to break 3rd party > software that rely on a stable hash across multiple Python instances. > Persistence layers like ZODB and cross interpreter communication > channels used by multiprocessing may (!) rely on the fact that the hash > of a string is fixed. Software that depends on an undefined hash function for synchronization and persistence deserves to break, IMO. There are plenty of well-defined hash functions available for this purpose. Bill From martin at v.loewis.de Tue Jan 3 23:21:30 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 03 Jan 2012 23:21:30 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120103222053.2325d352@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> Message-ID: <4F037F6A.1070806@v.loewis.de> > Have you read the following sentence: > > ?Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too.? I disagree. We don't need a PRNG on platforms without /dev/urandom or any other native RNG. Initializing the string-hash seed to 0 is perfectly fine on those platforms; we can do slightly better by using, say, the current time (in ms or ?s if available) and the current pid (if available). People concerned with the security on those systems either need to switch to a different system, or provide a patch to access the platform's native random number generator. Regards, Martin From ben+python at benfinney.id.au Tue Jan 3 23:30:24 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 04 Jan 2012 09:30:24 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87sjjwzaf3.fsf@benfinney.id.au> "Stephen J. Turnbull" writes: > Matt Joiner writes: > > > Readability is the highest concern, and this should be at the > > discretion of the contributor. > > That's quite backwards. "Readability" is community property, and has > as much, if not more, to do with common convention as with some > absolute metric. The "contributor's discretion" must yield. +1 -- \ ?Those who write software only for pay should go hurt some | `\ other field.? ?Erik Naggum, in _gnu.misc.discuss_ | _o__) | Ben Finney From martin at v.loewis.de Wed Jan 4 00:11:50 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 04 Jan 2012 00:11:50 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F034C23.7070000@stoneleaf.us> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: <4F038B36.1000401@v.loewis.de> > Readability also includes more than just the source code; as has already > been stated: > > if(cond) { > stmt1; > + stmt2; > } > > vs. > > -if(cond) > +if(cond) { > stmt1; > + stmt2; > +} > > I find the diff version that already had braces in place much more > readable. Is it really *much* more readable? I have no difficulties reading either (although I had preferred a space after the if; this worries me more than the double if line). Regards, Martin From benjamin at python.org Wed Jan 4 00:17:56 2012 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 3 Jan 2012 23:17:56 +0000 (UTC) Subject: [Python-Dev] PEP 7 clarification request: braces References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > Readability also includes more than just the source code; as has already > been stated: > > if(cond) { > stmt1; > + stmt2; > } > > vs. > > -if(cond) > +if(cond) { > stmt1; > + stmt2; > +} > > I find the diff version that already had braces in place much more readable. There are much larger problems facing diff readibility. On your basis, we might as well decree that code should never be arranged or reindented. Regards, Benjamin From mwm at mired.org Wed Jan 4 01:40:36 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 3 Jan 2012 16:40:36 -0800 Subject: [Python-Dev] Proposed PEP on concurrent programming support Message-ID: <20120103164036.681beeae@mikmeyer-vm-fedora> PEP: XXX Title: Interpreter support for concurrent programming Version: $Revision$ Last-Modified: $Date$ Author: Mike Meyer Status: Draft Type: Informational Content-Type: text/x-rst Created: 11-Nov-2011 Post-History: Abstract ======== The purpose of this PEP is to explore strategies for making concurrent programming in Python easier by allowing the interpreter to detect and notify the user about possible bugs in concurrent access. The reason for doing so is that "Errors should never pass silently". Such bugs are caused by allowing objects to be accessed simultaneously from another thread of execution while they are being modified. Currently, python systems provide no support for such bugs, falling back on the underlying platform facilities and some tools built on top of those. While these tools allow prevention of such modification if the programmer is aware of the need for them, there are no facilities to detect that such a need might exist and warn the programmer of it. The goal is not to prevent such bugs, as that depends on the programmer getting the logic of the interactions correct, which the interpreter can't judge. Nor is the goal to warn the programmer about any such modifications - the goal is to catch standard idioms making unsafe modifications. If the programmer starts tinkering with Python's internals, it's assumed they are aware of these issues. Rationale ========= Concurrency bugs are among the hardest bugs to locate and fix. They result in corrupt data being generated or used in a computation. Like most such bugs, the corruption may not become evident until much later and far away in the program. Minor changes in the code can cause the bugs to fail to manifest. They may even fail to manifest from run to run, depending on external factors beyond the control of the programmer. Therefore any help in locating and dealing with such bugs is valuable. If the interpreter is to provide such help, it must be aware of when things are safe to modify and when they are not. This means it will almost certainly cause incompatible changes in Python, and may impose costs so high for non-concurrent operations as to make it untenable. As such, the final options discussed are destined for Python version 4 or later, and may never be implemented in any mainstream implementation of Python. Terminology =========== The word "thread" is used throughout to mean "concurrent thread of execution". Nominally, this means a platform thread. However, it is intended to include any threading mechanism that allows the interpreter to change threads between or in the middle of a statement without the programmer specifically allowing this to happen. Similarly, the word "interpreter" means any system that processes and executes Python language files. While this normally means cPython, the changes discussed here should be amenable to other implementations. Concept ======= Locking object -------------- The idea is that the interpreter should indicate an error anytime an unlocked object is mutated. For mutable types, this would mean changing the value of the type. For Python class instances, this would mean changing the binding of an attribute. Mutating an object bound to such an attribute isn't a change in the object the attribute belongs to, and so wouldn't indicate an error unless the object bound to the attribute was unlocked. Locking by name --------------- It's also been suggested that locking "names" would be useful. That is, to prevent a specific attribute of an object from being rebound, or a key/index entry in a mapping object. This provides a finer grained locking than just locking the object, as you could lock a specific attribute or set of attributes of an object, without locking all of them. Unfortunately, this isn't sufficient: a set may need to be locked to prevent deletions for some period, or a dictionary to prevent adding a key, or a list to prevent changing a slice, etc. So some other locking mechanism is required. If that needs to specify objects, some way of distinguishing between locking a name and locking the object bound to the name needs to be invented, or there needs to be two different locking mechanisms. It's not clear that the finer grained locking is worth adding yet another language mechanism. Alternatives ============ Explicit locking ---------------- These alternatives requires that the programmer explicitly name anything that is going to be changed to lock it before changing it. This lets the interpreter gets involved, but makes a number of errors possible based on the order that locks are applied. Platform locks '''''''''''''' The current tool set uses platform locks via a C extension. The problem with these is that the interpreter has no knowledge of them, and so can't do anything about detecting the mutation of unlocked objects. A ``locking`` keyword ''''''''''''''''''''' Adding a statement to tell the interpreter to lock objects for the attached suite would let the interpreter know which objects are locked. To help prevent deadlocks, such a keyword needs to imply an order for locking objects, such that two objects locked by the a locking statement will lock the two objects in the same order during a single execution of the program. This can be achieved by sorting objects by the ``id`` of the object, since the requirements for ``id`` are sufficient for this. While the locking order requirement is sufficient to prevent deadlocks from non-nested locking statements, it's not sufficient if locking statements are allowed to nest. So either nested locking statements need to be disallowed, or the outer statement must lock everything that's going to need to be locked. Either requirement is sufficiently onerous that alternatives need to be considered. Implicit locking ---------------- In this alternative, the interpter uses one or more heuristics to decide when things should need locking. Software Transactional Memory (STM) ''''''''''''''''''''''''''''''''''' STM is a relatively new technology being experimented with in newer languages, and in a number of 3rd party libraries (both Peak [#Peak]_ and Kamaelia [#Kamaelia]_ provide STM facilities). A suite is marked as a `transaction`, and then when an unlocked object is modified, instead of indicating an error, a locked copy of it is created to be used through the rest of the transaction. If any of the originals are modified during the execution of the suite, the suite is rerun from the beginning. If it completes, the locked copies are copied back to the originals in an atomic manner. This causes the changes seen by any threads not running the transaction to be atomic. If two threads are updating the same object in transactions, the one that finishes second will be restarted with values set by the one that finished first. The advantage of an STM is that the programmer doesn't have to worry about what is locked, and there's no overhead for using locked objects (after locking them, of course). The disadvantage is that any code in a transaction must be safe to run multiple times. This forbids any kind of I/O. Compiler support '''''''''''''''' Since the point is to get the interpreter involved, we might as well let it be involved in figuring out which things are safe and don't need to be locked. This could potentially eliminate a lot of locking. Each object - whether a Python class instance or builtin type - is created with no way to access it until it is bound. So it is inherently safe to modify. Being bound to a local (or nonlocal?) variable doesn't change this. Being bound to a global, class or instance variable or stored in a container does change this, as the object may now be accessed from other threads via the module or container. Since this analysis is being done at compile time, being passed to another function - including methods of the object - makes it unsafe. Likewise, yielding an object makes it unsafe for future use. Returning it doesn't change anything, since our execution is over and we lose access to the object. Unfortunately, objects returned from functions must be treated as unsafe. Interpreter support ''''''''''''''''''' If we instead track whether or not objects require locking in the interpreter, then we can improve the analsysis. The only thing that definitely makes an object unsafe is binding to a global variable or a variable known to be unsafe. Passing objects to C routines exposes them to concurrent modification, since there's no way to know what will happen inside the C routine. Adding some way of marking C routines - or possibly the objects passed to them - as not exposing things to concurrent modification would help with this, allowing C modules to be called without requiring locking everything passed to them. Binding to class and instance variables, or adding them to a container, is an interesting issue. If the object in question is safe, then anything added to it is also safe. However, this would mean that when an object is flagged as unsafe, all objects accessible through it would also have to be flagged as unsafe. This type of tracking also means that objects effectively have three states: locked, unlocked, and safe. Both locked and safe objects can safely be modified without a problem. Locking and unlocking safe objects is a nop. Interpreter threading --------------------- One alternative is replacing the current threading tools - which are wrappers around the OS-provided threading - with threading support in the interpreter. This would allow the interpreter to control whether or not objects are shared between threads, which isn't possible today. The full implications of this approach have as yet to be worked out. Mixed solutions --------------- Most likely, any real implementation would use a number of the techniques above, since all of them have serious shortcomings. For instance, combining STM with explicit locking would allow explicit locking when IO was required, but complex multi-object changes could be handled by STM, thus avoiding the nested locking issues. Likewise, interpreter or compiler support could be mixed with most other solutions to relax the requirement of locking for part of the objects used in a program. The implications of mixing these things together also needs to be explored more thoroughly. Change proposal =============== This is 'strawman' proposal to provide a starting point for discussion. The proposal is to add an STM support to the python interpreter. A new suite type - the ``transaction`` will be added to the language. The suite will have the semantics discussed above: modifying an object in the suite will trigger creation of a thread-local shallow copy to be used in the Transaction. Further modifications of the original will cause all existing copies to be discarded and the transaction to be restarted. At the end of the transaction, the originals of all the copies are locked and then updated to the state of the copy. Further work ============ Requiring further investigation: - The interpreter providing it's own threading. - How various solutions interact when mixed. There are also a couple tools that might be useful to build, or at least investigate building: - A static concurrency safety analyzer, that handled the AST of a function to determine which variables are safe. - A dynamic concurrency safety analyzer, similar to coverage [#coverage]_. Implementation Notes ==================== Not significantly impacting the performance of single-threaded code must be of paramount importance to any implementation. One implementation technique arose that could help with this. Instead of keeping track of the objects state and having methods check that state and modify their behavior based on it, change the methods as the object changes state. So in safe or locked mode, the objects methods could freely modify the object without having to check it's mode. In unlocked mode, an attempt to do so would raise an error or warning. Unfortunately, this doesn't work if some global or thread state must be checked instead of just object-local state. References ========== .. [#Peak] "Peak, the Python Enterprise Application Kit", http://peak.telecommunity.com/ .. [#Kamaelia] "Kamaelia - Concurrency made useful, fun", http://www.kamaelia.org/ .. [#coverage] "Code coverage measurement for Python", http://pypi.python.org/pypi/coverage Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From tjreedy at udel.edu Wed Jan 4 01:41:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 03 Jan 2012 19:41:53 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <63988.1325628139@parc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On 1/3/2012 5:02 PM, Bill Janssen wrote: > Software that depends on an undefined hash function for synchronization > and persistence deserves to break, IMO. There are plenty of > well-defined hash functions available for this purpose. The doc for id() now says "This is an integer which is guaranteed to be unique and constant for this object during its lifetime." Since the default 3.2.2 hash for my win7 64bit CPython is id-address // 16, it can have no longer guarantee. I suggest that hash() doc say something similar: http://bugs.python.org/issue13707 -- Terry Jan Reedy From solipsis at pitrou.net Wed Jan 4 02:34:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 02:34:03 +0100 Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function References: Message-ID: <20120104023403.1c86c12e@pitrou.net> > +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \ > + Py_ssize_t length, Py_UCS4 fill_char) > + > + Fill a string with a character: write *fill_char* into > + ``unicode[start:start+length]``. > + > + Fail if *fill_char* is bigger than the string maximum character, or if the > + string has more than 1 reference. > + > + Return the number of written character, or return ``-1`` and raise an > + exception on error. The return type should then be Py_ssize_t, not int. Regards Antoine. From ncoghlan at gmail.com Wed Jan 4 02:42:20 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Jan 2012 11:42:20 +1000 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F037F6A.1070806@v.loewis.de> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> Message-ID: On Wed, Jan 4, 2012 at 8:21 AM, "Martin v. L?wis" wrote: >> Have you read the following sentence: >> >> ?Since some platforms may not have /dev/urandom, we need a PRNG in the >> core, too. I therefore propose to move the Mersenne twister from >> randommodule.c into the core, too.? > > I disagree. We don't need a PRNG on platforms without /dev/urandom or > any other native RNG. > Initializing the string-hash seed to 0 is perfectly fine on those > platforms; we can do slightly better by using, say, the current > time (in ms or ?s if available) and the current pid (if available). > > People concerned with the security on those systems either need to > switch to a different system, or provide a patch to access the > platform's native random number generator. +1 (especially given how far back this is going to be ported) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Jan 4 02:59:51 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 02:59:51 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F037F6A.1070806@v.loewis.de> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> Message-ID: <20120104025951.0f57cca8@pitrou.net> On Tue, 03 Jan 2012 23:21:30 +0100 "Martin v. L?wis" wrote: > > Have you read the following sentence: > > > > ?Since some platforms may not have /dev/urandom, we need a PRNG in the > > core, too. I therefore propose to move the Mersenne twister from > > randommodule.c into the core, too.? > > I disagree. We don't need a PRNG on platforms without /dev/urandom or > any other native RNG. Well what if /dev/urandom is unavailable because the program is run e.g. in a chroot? (or is /dev/urandom still available in a chroot?) Regards Antoine. From stephen at xemacs.org Wed Jan 4 05:10:37 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 13:10:37 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > Ethan Furman stoneleaf.us> writes: > > > > Readability also includes more than just the source code; as has already > > been stated: [diffs elided] > > I find the diff version that already had braces in place much more readable. > > There are much larger problems facing diff readibility. On your basis, we might > as well decree that code should never be arranged or reindented. That's a reasonable approach sometimes used, but it would be hard in Python. Specifically, I often produce two patches when substantial rearrangement is involved. The first isolates the actual changes, the second does the reformatting. In Python, the first patch might be syntactically erroneous, which would be both annoying for automatic testing and less readable. A Python-friendly alternative is to provide both a machine-appliable diff and a diff ignoring whitespace changes. This could be a toggle in web interfaces to the VCS. I've also sometimes found doing word diffs to be useful. Most developers resist such procedures passionately, though. *shrug* From benjamin at python.org Wed Jan 4 05:32:23 2012 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 3 Jan 2012 22:32:23 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2012/1/3 Stephen J. Turnbull : > Benjamin Peterson writes: > ?> Ethan Furman stoneleaf.us> writes: > ?> > > ?> > Readability also includes more than just the source code; as has already > ?> > been stated: > > [diffs elided] > > ?> > I find the diff version that already had braces in place much more readable. > ?> > ?> There are much larger problems facing diff readibility. On your basis, we might > ?> as well decree that code should never be arranged or reindented. > > That's a reasonable approach sometimes used My goodness, I was trying to make a ridiculous-sounding proposition. -- Regards, Benjamin From pje at telecommunity.com Wed Jan 4 06:07:27 2012 From: pje at telecommunity.com (PJ Eby) Date: Wed, 4 Jan 2012 00:07:27 -0500 Subject: [Python-Dev] Proposed PEP on concurrent programming support In-Reply-To: <20120103164036.681beeae@mikmeyer-vm-fedora> References: <20120103164036.681beeae@mikmeyer-vm-fedora> Message-ID: On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > STM is a relatively new technology being experimented with in newer > languages, and in a number of 3rd party libraries (both Peak [#Peak]_ > and Kamaelia [#Kamaelia]_ provide STM facilities). I don't know about Kamaelia, but PEAK's STM (part of the Trellis event-driven library) is *not* an inter-thread concurrency solution: it's actually used to sort out the order of events in a co-operative multitasking scenario. So, it should not be considered evidence for the practicality of doing inter-thread co-ordination that way in pure Python. A suite is marked > as a `transaction`, and then when an unlocked object is modified, > instead of indicating an error, a locked copy of it is created to be > used through the rest of the transaction. If any of the originals are > modified during the execution of the suite, the suite is rerun from > the beginning. If it completes, the locked copies are copied back to > the originals in an atomic manner. > I'm not sure if "locked" is really the right word here. A private copy isn't "locked" because it's not shared. The disadvantage is that any code in a transaction must be safe to run > multiple times. This forbids any kind of I/O. > More precisely, code in a transaction must be *reversible*, so it doesn't forbid any I/O that can be undone. If you can seek backward in an input file, for example, or delete queued output data, then it can still be done. Even I/O like re-drawing a screen can be made STM safe by making the redraw occur after a transaction that reads and empties a buffer written by other transactions. For > instance, combining STM with explicit locking would allow explicit > locking when IO was required, I don't think this idea makes any sense, since STM's don't really "lock", and to control I/O in an STM system you just STM-ize the queues. (Generally speaking.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 4 07:30:16 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 15:30:16 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > My goodness, I was trying to make a ridiculous-sounding proposition. In this kind of discussion, that's in the same class as "be careful what you wish for -- because you might just get it." From fijall at gmail.com Wed Jan 4 08:59:15 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 4 Jan 2012 09:59:15 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <63988.1325628139@parc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: > Christian Heimes wrote: > >> Am 29.12.2011 12:13, schrieb Mark Shannon: >> > The attack relies on being able to predict the hash value for a given >> > string. Randomising the string hash function is quite straightforward. >> > There is no need to change the dictionary code. >> > >> > A possible (*untested*) patch is attached. I'll leave it for those more >> > familiar with unicodeobject.c to do properly. >> >> I'm worried that hash randomization of str is going to break 3rd party >> software that rely on a stable hash across multiple Python instances. >> Persistence layers like ZODB and cross interpreter communication >> channels used by multiprocessing may (!) rely on the fact that the hash >> of a string is fixed. > > Software that depends on an undefined hash function for synchronization > and persistence deserves to break, IMO. ?There are plenty of > well-defined hash functions available for this purpose. > > Bill > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com A lot of software will break their tests, because dict ordering would depend on the particular run. I know, because some of them break on pypy which has a different dict ordering. This is probably a good thing in general, but is it really worth it? People will install python 2.6.newest and stuff *will* break. Is it *really* a security issue? We knew all along that dicts are O(n^2) in worst case scenario, how is this suddenly a security problem? Cheers, fijal From martin at v.loewis.de Wed Jan 4 09:02:14 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 04 Jan 2012 09:02:14 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: <4F040786.3040307@v.loewis.de> > Well what if /dev/urandom is unavailable because the program is run > e.g. in a chroot? If the system ought to have /dev/urandom (as e.g. determined during configure), I propose that Python fails fast, unless the command line option is given that disables random hash seeds. For the security fixes, we therefore might want to toggle the meaning of the command line switch, i.e. only use random seeds if explicitly requested. > (or is /dev/urandom still available in a chroot?) You can make it available if you want to: just create a /dev directory, and do mknod in it. It's common to run /dev/MAKEDEV (or similar), or to mount devfs into a chroot environment; else many programs run in the chroot are likely going to fail (e.g. if /dev/tty is missing). See, for example, http://tldp.org/HOWTO/Chroot-BIND-HOWTO-2.html bind apparently requires /dev/null and /dev/random. Regards, Martin From solipsis at pitrou.net Wed Jan 4 11:55:13 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 11:55:13 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: <20120104115513.39db6b8b@pitrou.net> On Wed, 4 Jan 2012 09:59:15 +0200 Maciej Fijalkowski wrote: > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? Because it has been shown to be exploitable for malicious purposes? Regards Antoine. From lists at cheimes.de Wed Jan 4 12:18:54 2012 From: lists at cheimes.de (Christian Heimes) Date: Wed, 04 Jan 2012 12:18:54 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: <4F04359E.3070804@cheimes.de> Am 04.01.2012 08:59, schrieb Maciej Fijalkowski: > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? For example Microsoft has released an extraordinary and unscheduled security patch for the issue between Christmas and New Year. I don't normally use MS as reference but this should give you a hint about the severity. Have you watched the talk yet? http://www.youtube.com/watch?v=R2Cq3CLI6H8 Christian From victor.stinner at gmail.com Wed Jan 4 04:30:06 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 4 Jan 2012 04:30:06 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: > (or is /dev/urandom still available in a chroot?) Last time that I played with chroot, I "binded" /dev and /proc. Many programs rely on specific devices like /dev/null. Python should not refuse to start if /dev/urandom (or CryptoGen) is missing or cannot be used, but should use a weak fallback. Victor From victor.stinner at gmail.com Wed Jan 4 04:30:16 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 4 Jan 2012 04:30:16 +0100 Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function In-Reply-To: <20120104023403.1c86c12e@pitrou.net> References: <20120104023403.1c86c12e@pitrou.net> Message-ID: Oops, it's a typo in the doc (copy/paste failure). It's now fixed, thanks. Victor 2012/1/4 Antoine Pitrou : > >> +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \ >> + ? ? ? ? ? ? ? ? ? ? ? ?Py_ssize_t length, Py_UCS4 fill_char) >> + >> + ? Fill a string with a character: write *fill_char* into >> + ? ``unicode[start:start+length]``. >> + >> + ? Fail if *fill_char* is bigger than the string maximum character, or if the >> + ? string has more than 1 reference. >> + >> + ? Return the number of written character, or return ``-1`` and raise an >> + ? exception on error. > > The return type should then be Py_ssize_t, not int. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40haypocalc.com From brian at python.org Wed Jan 4 15:05:28 2012 From: brian at python.org (Brian Curtin) Date: Wed, 4 Jan 2012 08:05:28 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jan 4, 2012 at 00:30, Stephen J. Turnbull wrote: > Benjamin Peterson writes: > > ?> My goodness, I was trying to make a ridiculous-sounding proposition. > > In this kind of discussion, that's in the same class as "be careful > what you wish for -- because you might just get it." I wish we could move onto better discussions than brace placement/existence at this point. *crosses fingers* From jimjjewett at gmail.com Wed Jan 4 15:41:19 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 4 Jan 2012 09:41:19 -0500 Subject: [Python-Dev] Proposed PEP on concurrent programming support Message-ID: (I've added back python-ideas, because I think that is still the appropriate forum.) >.... A new > suite type - the ``transaction`` will be added to the language. The > suite will have the semantics discussed above: modifying an object in > the suite will trigger creation of a thread-local shallow copy to be > used in the Transaction. Further modifications of the original will > cause all existing copies to be discarded and the transaction to be > restarted. ... How will you know that an object has been modified? The only ways I can think of are (1) Timestamp every object -- or at least every mutable object -- and hope that everybody agrees on which modifications should count. (2) Make two copies of every object you're using in the suite; at the end, compare one of them to both the original and the one you were operating on. With this solution, you can decide for youself what counts as a modification, but it still isn't straightforward; I would consider changing a value to be changing a dict, even though nothing in the item (header) itself changed. -jJ From barry at python.org Wed Jan 4 16:20:28 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 4 Jan 2012 10:20:28 -0500 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: <20120104102028.4c722b77@limelight.wooz.org> On Jan 04, 2012, at 02:59 AM, Antoine Pitrou wrote: >Well what if /dev/urandom is unavailable because the program is run >e.g. in a chroot? >(or is /dev/urandom still available in a chroot?) It is (apparently) in an schroot in Ubuntu, so I'd guess it's also available in Debian (untested). -Barry From ericsnowcurrently at gmail.com Wed Jan 4 20:15:46 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 4 Jan 2012 12:15:46 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On Wed, Jan 4, 2012 at 12:59 AM, Maciej Fijalkowski wrote: > On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: >> Christian Heimes wrote: >> >>> Am 29.12.2011 12:13, schrieb Mark Shannon: >>> > The attack relies on being able to predict the hash value for a given >>> > string. Randomising the string hash function is quite straightforward. >>> > There is no need to change the dictionary code. >>> > >>> > A possible (*untested*) patch is attached. I'll leave it for those more >>> > familiar with unicodeobject.c to do properly. >>> >>> I'm worried that hash randomization of str is going to break 3rd party >>> software that rely on a stable hash across multiple Python instances. >>> Persistence layers like ZODB and cross interpreter communication >>> channels used by multiprocessing may (!) rely on the fact that the hash >>> of a string is fixed. >> >> Software that depends on an undefined hash function for synchronization >> and persistence deserves to break, IMO. ?There are plenty of >> well-defined hash functions available for this purpose. >> >> Bill >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > > A lot of software will break their tests, because dict ordering would > depend on the particular run. I know, because some of them break on > pypy which has a different dict ordering. This is probably a good > thing in general, but is it really worth it? People will install > python 2.6.newest and stuff *will* break. So if we're making the new hashing the default and giving an option to use the old, we should make it _really_ clear in the release notes/announcement about how to revert the behavior. -eric > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? > > Cheers, > fijal > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com From andrew at bemusement.org Thu Jan 5 05:26:27 2012 From: andrew at bemusement.org (Andrew Bennetts) Date: Thu, 5 Jan 2012 15:26:27 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120104115513.39db6b8b@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> Message-ID: <20120105042627.GA10082@flay.puzzling.org> On Wed, Jan 04, 2012 at 11:55:13AM +0100, Antoine Pitrou wrote: > On Wed, 4 Jan 2012 09:59:15 +0200 > Maciej Fijalkowski wrote: > > > > Is it *really* a security issue? We knew all along that dicts are > > O(n^2) in worst case scenario, how is this suddenly a security > > problem? > > Because it has been shown to be exploitable for malicious purposes? I don't think that's news either. http://mail.python.org/pipermail/python-dev/2003-May/035907.html and http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for instance show that in 2003 it was clearly known to at least be likely to be an exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). There was debate about whether it's the language's responsibility to mitigate the problem or if apps should use safer designs for handling untrusted input (e.g. limit the number of keys input is allowed to create, or use something other than dicts), and debate about just how practical an effective exploit would be. But I think it was understood to be a real concern 8 years ago, so not exactly sudden. Just because it's old news doesn't make it not a security problem, of course. -Andrew. From paul at smedley.id.au Thu Jan 5 09:58:29 2012 From: paul at smedley.id.au (Paul Smedley) Date: Thu, 05 Jan 2012 19:28:29 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 Message-ID: Hi All, I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2 platform. I have python.exe and python27.dll compiling find, but when starting to build sharedmods I'm getting the following error: running build running build_ext Traceback (most recent call last): File "./setup.py", line 2092, in main() File "./setup.py", line 2087, in main 'Lib/smtpd.py'] File "U:/DEV/python-2.7.2/Lib/distutils/core.py", line 152, in setup dist.run_commands() File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "U:/DEV/python-2.7.2/Lib/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "./setup.py", line 152, in build_extensions missing = self.detect_modules() File "./setup.py", line 1154, in detect_modules for arg in sysconfig.get_config_var("CONFIG_ARGS").split()] AttributeError: 'NoneType' object has no attribute 'split' make: *** [sharedmods] Error 1 Any suggestions? A google showed a similar error on AIX with no clear resolution. Thanks in advance, Paul From solipsis at pitrou.net Thu Jan 5 14:39:57 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 14:39:57 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> Message-ID: <20120105143957.1b5ba7fe@pitrou.net> On Thu, 5 Jan 2012 15:26:27 +1100 Andrew Bennetts wrote: > > I don't think that's news either. > http://mail.python.org/pipermail/python-dev/2003-May/035907.html and > http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for > instance show that in 2003 it was clearly known to at least be likely to be an > exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). > > There was debate about whether it's the language's responsibility to mitigate > the problem or if apps should use safer designs for handling untrusted input > (e.g. limit the number of keys input is allowed to create, or use something > other than dicts), and debate about just how practical an effective exploit > would be. But I think it was understood to be a real concern 8 years ago, so > not exactly sudden. That's not news indeed, but that doesn't make it less of a problem, especially now that the issue has been widely publicized through a conference and announcements on several widely-read Web sites. That said, only doing the security fix in 3.3 would have the nice side effect of pushing people towards Python 3, so perhaps I'm for it after all. Half-jokingly, Antoine. From mark at hotpy.org Thu Jan 5 14:46:52 2012 From: mark at hotpy.org (Mark Shannon) Date: Thu, 05 Jan 2012 13:46:52 +0000 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. Message-ID: <4F05A9CC.3000806@hotpy.org> Hi, Python code should not depend upon the ordering of items in a dict. Unfortunately it seems that a number of tests in the standard library do just that. Changing PyDict_MINSIZE from 8 to either 4 or 16 causes the following tests to fail: test_dis test_email test_inspect test_nntplib test_packaging test_plistlib test_pprint test_symtable test_trace test_sys also fails, but this is a legitimate failure in sys.getsizeof() Changing the collision resolution function from f(n) = 5n + 1 to f(n) = n + 1 results in the same failures, except for test_packaging and test_symtable which pass. Finally, changing the seed in unicode_hash() from (implicit) 0 to an arbitrary value (12345678) causes the above tests to fail plus: test_json test_set test_ttk_textonly test_urllib test_urlparse I think this is a real issue as the unicode_hash() function is likely to change soon due to http://bugs.python.org/issue13703. Should I: 1. Submit one big bug report? 2. Submit a bug report for each "failing" test separately? 3. Ignore it, since the tests only fail when I start messing about? Cheers, Mark. From solipsis at pitrou.net Thu Jan 5 14:58:13 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 14:58:13 +0100 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. References: <4F05A9CC.3000806@hotpy.org> Message-ID: <20120105145813.35c9b8c5@pitrou.net> On Thu, 05 Jan 2012 13:46:52 +0000 Mark Shannon wrote: > > Should I: > > 1. Submit one big bug report? > > 2. Submit a bug report for each "failing" test separately? I would say a separate bug report for each failing test file, i.e. one report for test_dis, one for test_email etc. Hope this doesn't eat too much of your time :) Regards Antoine. From amauryfa at gmail.com Thu Jan 5 15:02:44 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 5 Jan 2012 15:02:44 +0100 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: 2012/1/5 Paul Smedley > Hi All, > > I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2 > platform. > > I have python.exe and python27.dll compiling find, but when starting to > build sharedmods I'm getting the following error: > running build > running build_ext > Traceback (most recent call last): > File "./setup.py", line 2092, in > main() > File "./setup.py", line 2087, in main > 'Lib/smtpd.py'] > File "U:/DEV/python-2.7.2/Lib/**distutils/core.py", line 152, in setup > dist.run_commands() > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 953, in > run_commands > self.run_command(cmd) > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/**distutils/command/build.py", line 127, > in run > self.run_command(cmd_name) > File "U:/DEV/python-2.7.2/Lib/**distutils/cmd.py", line 326, in > run_command > self.distribution.run_command(**command) > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/**distutils/command/build_ext.**py", line > 340, in run > self.build_extensions() > File "./setup.py", line 152, in build_extensions > missing = self.detect_modules() > File "./setup.py", line 1154, in detect_modules > for arg in sysconfig.get_config_var("**CONFIG_ARGS").split()] > AttributeError: 'NoneType' object has no attribute 'split' > make: *** [sharedmods] Error 1 > > > Any suggestions? A google showed a similar error on AIX with no clear > resolution. > Is it in the part that configures the "dbm" module? This paragraph is already protected by a "if platform not in ['cygwin']:", I suggest to exclude 'os2emx' as well. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jan 5 16:15:33 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 5 Jan 2012 10:15:33 -0500 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. In-Reply-To: <4F05A9CC.3000806@hotpy.org> References: <4F05A9CC.3000806@hotpy.org> Message-ID: <20120105101533.6265853b@limelight.wooz.org> On Jan 05, 2012, at 01:46 PM, Mark Shannon wrote: >2. Submit a bug report for each "failing" test separately? I'm sure it will be a pain, but this is really the best thing to do. -Barry From fijall at gmail.com Thu Jan 5 18:34:13 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 5 Jan 2012 19:34:13 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105143957.1b5ba7fe@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou wrote: > On Thu, 5 Jan 2012 15:26:27 +1100 > Andrew Bennetts wrote: >> >> I don't think that's news either. >> http://mail.python.org/pipermail/python-dev/2003-May/035907.html and >> http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for >> instance show that in 2003 it was clearly known to at least be likely to be an >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). >> >> There was debate about whether it's the language's responsibility to mitigate >> the problem or if apps should use safer designs for handling untrusted input >> (e.g. limit the number of keys input is allowed to create, or use something >> other than dicts), and debate about just how practical an effective exploit >> would be. ?But I think it was understood to be a real concern 8 years ago, so >> not exactly sudden. > > That's not news indeed, but that doesn't make it less of a problem, > especially now that the issue has been widely publicized through a > conference and announcements on several widely-read Web sites. > > That said, only doing the security fix in 3.3 would have the nice side > effect of pushing people towards Python 3, so perhaps I'm for it after > all. > > Half-jokingly, > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com Just to make things clear - stdlib itself has 1/64 of tests relying on dict order. Changing dict order in *older* pythons will break everyone's tests and some peoples code. Making this new 2.6.x release would mean that people using new python 2.6 would have to upgrade an unspecified amount of their python packages, that does not sound very cool. Also consider that new 2.6.x would go as a security fix to old ubuntu, but all other packages won't, because they'll not contain security fixes. Just so you know Cheers, fijal From v+python at g.nevcal.com Thu Jan 5 20:14:51 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jan 2012 11:14:51 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: <4F05F6AB.3060704@g.nevcal.com> On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote: > Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Just so you know Why should CPython by constrained by broken policies of Ubuntu? If the other packages must be fixed so they work correctly with a security fix in Python, then they should be considered as containing a security fix. If they aren't, then that is a broken policy. On the other hand, it is very true that the seductive convenience of dict (readily available, good performance) in normal cases have created the vulnerability because its characteristics are a function of the data inserted, and when used for data that is from unknown, possibly malicious sources, that is a bug in the program that uses dict, not in dict itself. So it seems to me that: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. 2) changing CPython in a way that breaks code is not a security fix to CPython, but rather a gratuitous breakage of compatibility promises, wrapped in a security-fix lie. The problem for CPython here can be summarized as follows: a) it is being blamed for problems in web servers that are not problems in CPython b) perhaps dict documentation is a bit too seductive, in not declaring that data from malicious sources could cause its performance to degrade significantly (but then, any programmer who has actually taken a decent set of programming classes should understand that, but on the other hand, there are programmers who have not taken such classes). c) CPython provides no other mapping data structures that rival the performance and capabilities of dict as an alternative, nor can such data structures be written in CPython, as the performance of dict comes not only from hashing, but also from being written in C. The solutions could be: A) push back on the blame: it is not a CPython problem B) perhaps add a warning to the documentation for the na?ve, untrained programmers C) consider adding an additional data structure to the language, and mention it in the B warning for versions 3.3+. On the other hand, the web server vulnerability could be blamed on CPython in another way: identify vulnerable packages in the stdlib that are likely the be used during the parsing of user-supplied data. Ones that come to mind (Python 3.2) are: urllib.parse (various parse* functions) (package names different in Python 2.x) cgi (parse_multipart, FieldStorage) So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function. How to fix? Each of those above allocates and returns a dict. Simply have each of those allocate and return and wrapped dict, which has the following behaviors: i) during __init__, create a local, random, string. ii) for all key values, prepend the string, before passing it to the internal dict. Changing these vulnerable packages rather than the hash function is a much more constrained change, and wouldn't create bugs in programs that erroneously depend on the current hash function directly or indirectly. This would not fix web servers that use their own parsing and storage mechanism for

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4