Showing content from https://mail.python.org/pipermail/python-dev/2005-February.txt below:
References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: <16908.40790.23812.274563@gargle.gargle.HOWL> Jeremy Hylton writes: > Maybe some ambitious PSF activitst could contact Roskind and Steve > Kirsch and see if they know who at Disney to talk to... Or maybe the > Disney guys who were at PyCon last year could help. please could somebody give me a contact address? Matthias From jhylton at gmail.com Fri Feb 11 13:35:18 2005 From: jhylton at gmail.com (Jeremy Hylton) Date: Fri Feb 11 13:35:21 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16908.40214.287358.160325@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <16908.40214.287358.160325@gargle.gargle.HOWL> Message-ID: On Fri, 11 Feb 2005 12:55:02 +0100, Matthias Klose wrote: > > Currently md5c.c is included in the python sources. The libmd > > implementation has a drop in replacement for md5c.c. The openssl > > implementation is a complicated tangle of Makefile expanded template > > code that would be harder to include in the Python sources. > > I would prefer that one as a short term solution. Patch at #1118602. Unfortunately a license that says it is in the public domain is unacceptable (and should be for Debian, too). That is to say, it's not possible for someone to claim that something they produce is in the public domain. See http://www.linuxjournal.com/article/6225 Jeremy From skip at pobox.com Fri Feb 11 13:54:32 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 11 13:54:44 2005 Subject: Bug#293932: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16908.40790.23812.274563@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> <16908.40790.23812.274563@gargle.gargle.HOWL> Message-ID: <16908.43784.902706.197167@montanaro.dyndns.org> >> Maybe some ambitious PSF activitst could contact Roskind and Steve >> Kirsch and see if they know who at Disney to talk to... Or maybe the >> Disney guys who were at PyCon last year could help. Matthias> please could somebody give me a contact address? Steve's easy enough to get ahold of: http://www.skirsch.com/ (He even still has a UltraSeek-powered search of his site. ;-) Search Kirsch's site for Jim Roskind returned jar@netscape.com but that was dated 31 Oct 2000. An abstract for a talk at University of Arizona in late 2003 sort of implied he was still at Netscape then ... maybe... Skip From greg at electricrain.com Fri Feb 11 18:51:18 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Fri Feb 11 18:51:26 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108102539.3753.87.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> Message-ID: <20050211175118.GC25441@zot.electricrain.com> > I think it would be cleaner and simpler to modify the existing > md5module.c to use the openssl md5 layer API (this is just a > search/replace to change the function names). The bigger problem is > deciding what/how/whether to include the openssl md5 implementation > sources so that win32 can use them. yes, that is all i was suggesting. win32 python is already linked against openssl for the socket module ssl support, having the md5 and sha1 modules depend on openssl should not cause a problem. -greg From trentm at ActiveState.com Fri Feb 11 19:37:15 2005 From: trentm at ActiveState.com (Trent Mick) Date: Fri Feb 11 19:39:35 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken Message-ID: <420CFB5B.7030007@activestate.com> Has anyone else noticed that viewcvs is broken on SF? > [trentm@booboo ~] > $ curl -D tmp/headers http://cvs.sourceforge.net/viewcvs.py/python > > > 502 Bad Gateway > > Bad Gateway >
The proxy server received an invalid > response from an upstream server.
>
> > [trentm@booboo ~] > $ cat tmp/headers > HTTP/1.1 502 Bad Gateway > Date: Fri, 11 Feb 2005 18:38:25 GMT > Server: Apache/2.0.40 (Red Hat Linux) > Content-Length: 232 > Connection: close > Content-Type: text/html; charset=iso-8859-1 Or is this just me? It is also broken for other projects for me -- e.g. 'pywin32'. Cheers, Trent -- Trent Mick trentm@activestate.com From tim.peters at gmail.com Fri Feb 11 20:14:30 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 20:14:33 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <420CFB5B.7030007@activestate.com> References: <420CFB5B.7030007@activestate.com> Message-ID: <1f7befae05021111143c346e3@mail.gmail.com> [Trent Mick] > Has anyone else noticed that viewcvs is broken on SF? It failed the same way from Virginia just now. I suppose that's your reward for kindly updating the Python copyright . The good news is that you can use this lull in your Python work to contribute to ZODB development! ViewCVS at zope.org is always happy to see you: http://svn.zope.org/ZODB/trunk/ From theller at python.net Fri Feb 11 20:20:57 2005 From: theller at python.net (Thomas Heller) Date: Fri Feb 11 20:19:24 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <1f7befae05021111143c346e3@mail.gmail.com> (Tim Peters's message of "Fri, 11 Feb 2005 14:14:30 -0500") References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> Message-ID: <7jleewdi.fsf@python.net> Tim Peters writes: > [Trent Mick] >> Has anyone else noticed that viewcvs is broken on SF? > > It failed the same way from Virginia just now. I suppose that's your > reward for kindly updating the Python copyright . > The failure lasts already for several days: http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1107968334 Thomas From tim.peters at gmail.com Fri Feb 11 20:24:51 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 20:24:54 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <7jleewdi.fsf@python.net> References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> <7jleewdi.fsf@python.net> Message-ID: <1f7befae05021111246ca3c616@mail.gmail.com> [Thomas Heller] Jeez Louise! As of 2005-02-09 there is an outage of anonymous CVS (tarballs, pserver-based CVS and ViewCVS) for projects whose UNIX names start with the letters m, n, p, q, t, y and z. We are currently working on resolving this issue. So that means it wouldn't even do us any good to rename the project to Thomas, Trent, Mick, Tim, Peters, or ZPython either! All right. Heller 2.5, here we come. From theller at python.net Fri Feb 11 20:27:11 2005 From: theller at python.net (Thomas Heller) Date: Fri Feb 11 20:25:39 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <1f7befae05021111143c346e3@mail.gmail.com> (Tim Peters's message of "Fri, 11 Feb 2005 14:14:30 -0500") References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> Message-ID: <1xbmew34.fsf@python.net> Tim Peters writes: > [Trent Mick] >> Has anyone else noticed that viewcvs is broken on SF? > > It failed the same way from Virginia just now. I suppose that's your > reward for kindly updating the Python copyright . > > The good news is that you can use this lull in your Python work to > contribute to ZODB development! ViewCVS at zope.org is always happy > to see you: > > http://svn.zope.org/ZODB/trunk/ Thomas Heller writes: > The failure lasts already for several days: > > http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1107968334 "As of 2005-02-09 there is an outage of anonymous CVS (tarballs, pserver-based CVS and ViewCVS) for projects whose UNIX names start with the letters m, n, p, q, t, y and z." As you can see, both projects with names starting with 'p' and 'z' are affected, so may I suggest to contribute to *ctypes* instead of zope ;-) Thomas From mcherm at mcherm.com Fri Feb 11 21:03:29 2005 From: mcherm at mcherm.com (Michael Chermside) Date: Fri Feb 11 21:03:39 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c Message-ID: <1108152209.420d0f91e312c@mcherm.com> Jeremy writes: > Unfortunately a license that says it is in the public domain is > unacceptable (and should be for Debian, too). That is to say, it's > not possible for someone to claim that something they produce is in > the public domain. See http://www.linuxjournal.com/article/6225 Not quite true. It would be a bit off-topic to discuss on this list so I will simply point you to: http://creativecommons.org/license/publicdomain-2 ...which is specifically designed for the US legal system. It _IS_ possible for someone to produce something in the public domain, it just isn't as easy as some people think (just saying it doesn't necessarily make it so (at least under US law)) and it may not be a good idea. I would expect that if something truly WERE in the public domain, then it would be acceptable for Python (and for Debian too, for that matter). I can't comment on whether this applies to libmd. -- Michael Chermside From tim.peters at gmail.com Fri Feb 11 21:46:00 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 21:46:03 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108152209.420d0f91e312c@mcherm.com> References: <1108152209.420d0f91e312c@mcherm.com> Message-ID: <1f7befae0502111246244647c9@mail.gmail.com> [Jeremy Hylton] >> Unfortunately a license that says it is in the public domain is >> unacceptable (and should be for Debian, too). That is to say, it's >> not possible for someone to claim that something they produce is in >> the public domain. See http://www.linuxjournal.com/article/6225 [Michael Chermside] > Not quite true. It would be a bit off-topic to discuss on this list > so I will simply point you to: > > http://creativecommons.org/license/publicdomain-2 > > ...which is specifically designed for the US legal system. It _IS_ > possible for someone to produce something in the public domain, it > just isn't as easy as some people think (just saying it doesn't > necessarily make it so (at least under US law)) and it may not be > a good idea. The article Jeremy pointed at was written by the Python Software Foundation's occasional legal counsel, and he disagrees. While I would love to believe that copyright law isn't this bizarre, I can't recommend going against the best legal advice the PSF was willing to pay for . Note that Creative Commons doesn't recommend that you do either; from their FAQ: Can I use a Creative Commons license for software? In theory, yes, but it is not in your best interest. We strongly encourage you to use one of the very good software licenses available today. (The Free Software Foundation and the Open Source Initiative stand out as resources for such licenses.) > I would expect that if something truly WERE in the public domain, > then it would be acceptable for Python (and for Debian too, for > that matter). So would I, but according to Larry there isn't such a thing (excepting software written by the US Government; and for other software you might be thinking about today, maybe in about a century if the author lets their copyright lapse). If Larry is correct, it isn't legally possible for an individual in the US to disclaim copyright, regardless what they may say or sign. The danger then is that accepting software that purports to be free of copyright can come back to bite you, if the author later changes their mind (from your POV; the claim is that from US law's POV, nothing has actually changed, since the author never actually gave up copyright to begin with). The very fact that this argument exists underscores the desirability of only accepting software with an explicit license, spelling out the copyright holder's intents wrt distribution, modification, etc. Then you're just in legal mud, instead of legal quicksand. From pje at telecommunity.com Fri Feb 11 23:59:33 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 11 23:57:10 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1f7befae0502111246244647c9@mail.gmail.com> References: <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> Message-ID: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> At 03:46 PM 2/11/05 -0500, Tim Peters wrote: >If Larry is correct, it isn't legally possible for an individual in >the US to disclaim copyright, regardless what they may say or sign. >The danger then is that accepting software that purports to be free of >copyright can come back to bite you, if the author later changes their >mind (from your POV; the claim is that from US law's POV, nothing has >actually changed, since the author never actually gave up copyright to >begin with). > >The very fact that this argument exists underscores the desirability >of only accepting software with an explicit license, spelling out the >copyright holder's intents wrt distribution, modification, etc. Then >you're just in legal mud, instead of legal quicksand. And as long as we're flailing about in a substance which may include, but is not limited to, mud and/or quicksand or other flailing-suitable legal substances, it should be pointed out that even though software presented by its owner to be in the public domain is technically still copyright by that individual, the odds of them successfully prosecuting a copyright enforcement action might be significantly narrowed, due to the doctrine of promissory estoppel. Promissory estoppel is basically the idea that one-sided promises *are* enforceable when somebody reasonably relies on them and is injured by the withdrawal. IBM, for example, has pled in its defense against SCO that SCO's distribution of its so-called proprietary code under the GPL constituted a reasonable promise that others were free to use the code under the terms of the GPL, and that IBM further relied on that promise. Ergo, they are claiming, SCO's promise is enforceable by law. Of course, SCO v. IBM hasn't had any judgments yet, certainly not on that subject, and maybe never will. But it's important to know that the law *does* have some principles like this that allow overriding the more egregiously insane aspects of the law. :) Oh, also, if somebody decides to back out on their dedication to the public domain, and you can show that they did it on purpose, then that's "unclean hands" and possibly "copyright abuse" as well. Just to muddy up the waters a little bit. :) Obviously, the PSF should follow its own lawyer's advice, but it seemed to me that the point of Mr. Rosen's article was more to advise people releasing software to use a license that allows them to disclaim warranties. I personally can't see how taking the reasonable interpretation of a public domain declaration can lead to any difficulties, but then, IANAL. I'm surprised, however, that he didn't even touch on promissory estoppel, if there is some reason he believes that the doctrine wouldn't apply to a software license. Heck, I was under the impression that free copyright licenses in general got their effect by way of promissory estoppel, since such licenses are always one-sided promises. The GPL in particular makes an explicit point of this, even though it doesn't use the words "promissory estoppel". The point is that the law doesn't allow you to copy, so the license is your defense against a charge of copyright infringement. Therefore, even Rosen's so-called "Give it away" license is enforceable, in the sense that the licensor should be barred from taking action against someone taking the license at face value. Rosen also says, "Under basic contract law, a gift cannot be enforced. The donor can retract his gift at any time, for any reason". If this were true, I could give you a watch for Christmas and then sue you to make you give it back, so I'm not sure what he's getting at here. But again, IANAL, certainly not a famous one like Mr. Rosen. I *am* most curious to know why his article seems to imply that a promise not to sue someone for copyright infringement isn't a valid defense against such a suit, because that would seem to imply that *no* free software license is valid, including the GPL or the PSF license! (Surely those "gifts" can be retracted too, no?) From abo at minkirri.apana.org.au Sat Feb 12 00:11:01 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 12 00:11:16 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> Message-ID: <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> G'day again, From: "Gregory P. Smith" > > I think it would be cleaner and simpler to modify the existing > > md5module.c to use the openssl md5 layer API (this is just a > > search/replace to change the function names). The bigger problem is > > deciding what/how/whether to include the openssl md5 implementation > > sources so that win32 can use them. > > yes, that is all i was suggesting. > > win32 python is already linked against openssl for the socket module > ssl support, having the md5 and sha1 modules depend on openssl should > not cause a problem. IANAL... I have too much common sense, so I won't argue licences :-) So is openssl already included in the Python sources, or is it just a dependency? I had a quick look and couldn't find it so it must be a dependency. Given that Python is already dependant on openssl, it makes sense to change md5sum to use it. I have a feeling that openssl internally uses md5, so this way we wont link against two different md5sum implementations. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From martin at v.loewis.de Sat Feb 12 00:57:40 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 12 00:57:44 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> References: <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> Message-ID: <420D4674.4040804@v.loewis.de> Phillip J. Eby wrote: > I personally can't see how taking the reasonable interpretation of a > public domain declaration can lead to any difficulties, but then, > IANAL. The ultimate question is whether we could legally relicense such code under the Python license, ie. remove the PD declaration, and attach the Python license to it. I'm sure somebody would come along and claim "you cannot do that, and because you did, I cannot use your code, because it is not legally trustworthy"; people would say the same if the PD declaration would stay around. It is important for us that our users (including our commercial users) trust that Python has a clear legal track record. For such users, it is irrelevant whether you think that a litigation of the actual copyright holder would have any chance to stand in court, or whether such action is even likely. So for some users, replacing RSA-copyrighted-and-licensed code with PD-declared-and-unlicensed code makes Python less trustworthy. Clearly, for Debian, it is exactly the other way 'round. So I have rejected the patch, preserving the status quo, until a properly licensed open source implementation of md5 arrives. Until then, Debian will have to patch Python. > But again, IANAL, certainly not a famous one like Mr. Rosen. I *am* > most curious to know why his article seems to imply that a promise not > to sue someone for copyright infringement isn't a valid defense against > such a suit It might be, but that is irrelevant for open source projects that include contributions. Either they don't care too much about such things, in which case anything remotely "free" would be acceptable, or they are very nit-picking, in which case you need a good record for any contribution you ever received. Regards, Martin From pje at telecommunity.com Sat Feb 12 01:25:35 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Feb 12 01:23:11 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <420D4674.4040804@v.loewis.de> References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> At 12:57 AM 2/12/05 +0100, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>I personally can't see how taking the reasonable interpretation of a >>public domain declaration can lead to any difficulties, but then, IANAL. > >The ultimate question is whether we could legally relicense such >code under the Python license, ie. remove the PD declaration, and >attach the Python license to it. I'm sure somebody would come along >and claim "you cannot do that, and because you did, I cannot use >your code, because it is not legally trustworthy"; people would >say the same if the PD declaration would stay around. Right, but now we've moved off the legality and into marketing, which is an even less sane subject in some ways. The law at least has certain checks and balances built into it, but in marketing, people's irrationality knows no bounds. ;) >It might be, but that is irrelevant for open source projects that >include contributions. Either they don't care too much about such >things, in which case anything remotely "free" would be acceptable, >or they are very nit-picking, in which case you need a good record >for any contribution you ever received. Isn't the PSF somewhere in between? I mean, in theory we are supposed to be tracking stuff, but in practice there's no contributor agreement for CVS committers ala Zope Corp.'s approach. So in some sense right now, Python depends largely on the implied promise of its contributors to license their contributions under the same terms as Python. ISTM that if somebody's lawyer is worried about whether Python contains pseudo-public domain code, they should be downright horrified by the absence of a paper trail on the rest. But IANAM (I Am Not A Marketer), either. :) From martin at v.loewis.de Sat Feb 12 02:09:05 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 12 02:09:08 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> Message-ID: <420D5731.8020702@v.loewis.de> Phillip J. Eby wrote: > Isn't the PSF somewhere in between? I mean, in theory we are supposed > to be tracking stuff, but in practice there's no contributor agreement > for CVS committers ala Zope Corp.'s approach. That is not true, see http://www.python.org/psf/contrib.html We certainly don't have forms from all contributors, yet, but we are working on it. > So in some sense right > now, Python depends largely on the implied promise of its contributors > to license their contributions under the same terms as Python. ISTM > that if somebody's lawyer is worried about whether Python contains > pseudo-public domain code, they should be downright horrified by the > absence of a paper trail on the rest. But IANAM (I Am Not A Marketer), > either. :) And indeed, they are horrified. Right now, we can tell them we are working on it - so I would like to see that any change that we make to improve the PSF's legal standing. Adding code which was put into the "public domain" makes it worse (atleast in the specific case - we are clearly allowed to do what we do with the current md5 code; for the newly-proposed code, it is not so clear, even if you think it is likely we would win in court). Regards, Martin From bob at redivi.com Sat Feb 12 02:38:18 2005 From: bob at redivi.com (Bob Ippolito) Date: Sat Feb 12 02:38:33 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> Message-ID: <5d300838ef9716aeaae53579ab1f7733@redivi.com> On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote: > G'day again, > > From: "Gregory P. Smith" >>> I think it would be cleaner and simpler to modify the existing >>> md5module.c to use the openssl md5 layer API (this is just a >>> search/replace to change the function names). The bigger problem is >>> deciding what/how/whether to include the openssl md5 implementation >>> sources so that win32 can use them. >> >> yes, that is all i was suggesting. >> >> win32 python is already linked against openssl for the socket module >> ssl support, having the md5 and sha1 modules depend on openssl should >> not cause a problem. > > IANAL... I have too much common sense, so I won't argue licences :-) > > So is openssl already included in the Python sources, or is it just a > dependency? I had a quick look and couldn't find it so it must be a > dependency. > > Given that Python is already dependant on openssl, it makes sense to > change > md5sum to use it. I have a feeling that openssl internally uses md5, > so this > way we wont link against two different md5sum implementations. It is an optional dependency that is used when present (read: not just win32). The sources are not included with Python. OpenSSL does internally have an implementation of md5 (and sha1, among other things). -bob From pje at telecommunity.com Sat Feb 12 03:28:43 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Feb 12 03:26:19 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <420D5731.8020702@v.loewis.de> References: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050211212759.03db5b30@mail.telecommunity.com> At 02:09 AM 2/12/05 +0100, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>Isn't the PSF somewhere in between? I mean, in theory we are supposed to >>be tracking stuff, but in practice there's no contributor agreement for >>CVS committers ala Zope Corp.'s approach. > >That is not true, see > >http://www.python.org/psf/contrib.html > >We certainly don't have forms from all contributors, yet, but we >are working on it. > >>So in some sense right now, Python depends largely on the implied promise >>of its contributors to license their contributions under the same terms >>as Python. ISTM that if somebody's lawyer is worried about whether >>Python contains pseudo-public domain code, they should be downright >>horrified by the absence of a paper trail on the rest. But IANAM (I Am >>Not A Marketer), either. :) > >And indeed, they are horrified. Right now, we can tell them we are >working on it - so I would like to see that any change that we make >to improve the PSF's legal standing. Adding code which was put into >the "public domain" makes it worse (atleast in the specific case - >we are clearly allowed to do what we do with the current md5 code; >for the newly-proposed code, it is not so clear, even if you think >it is likely we would win in court). Thanks for the clarifications. From abo at minkirri.apana.org.au Sat Feb 12 03:54:27 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 12 03:54:37 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> Message-ID: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> G'day, From: "Bob Ippolito" > On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote: [...] > > Given that Python is already dependant on openssl, it makes sense to > > change > > md5sum to use it. I have a feeling that openssl internally uses md5, > > so this > > way we wont link against two different md5sum implementations. > > It is an optional dependency that is used when present (read: not just > win32). The sources are not included with Python. Are there any potential problems with making the md5sum module availability "optional" in the same way as this? > OpenSSL does internally have an implementation of md5 (and sha1, among > other things). Yeah, I know, that's why it could be used for the md5sum module :-) What I meant was a Python application using ssl sockets and the md5sum module will effectively have two different md5sum implementations in memory. Using the openssl md5sum for the md5sum module will make it "leaner", as well as faster. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From tjreedy at udel.edu Sat Feb 12 07:40:36 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Feb 12 07:40:52 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com><1108152209.420d0f91e312c@mcherm.com><1108152209.420d0f91e312c@mcherm.com><5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com><5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> <420D5731.8020702@v.loewis.de> Message-ID: ""Martin v. Löwis"" wrote in message news:420D5731.8020702@v.loewis.de... > http://www.python.org/psf/contrib.html After reading this page and pages linked thereto, I get the impression that you are only asking for contributor forms from contributors of original material (such as module or manual section) and not from submitters of suggestions (via news,mail) or patches (via sourceforge). Correct? Seems sensible to me that contributing via a public suggestion box constitutes permission to use the suggestion. Terry J. Reedy From amk at amk.ca Sat Feb 12 14:37:21 2005 From: amk at amk.ca (A.M. Kuchling) Date: Sat Feb 12 14:40:04 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> Message-ID: <20050212133721.GA13429@rogue.amk.ca> On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > Are there any potential problems with making the md5sum module availability > "optional" in the same way as this? The md5 module has been a standard module for a long time; making it optional in the next version of Python isn't possible. We'd have to require OpenSSL to compile Python. I'm happy to replace the MD5 and/or SHA implementations with other code, provided other code with a suitable license can be found. --amk From barry at python.org Sat Feb 12 15:06:12 2005 From: barry at python.org (Barry Warsaw) Date: Sat Feb 12 15:06:14 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: <1108217172.20404.37.camel@presto.wooz.org> On Sat, 2005-02-12 at 08:37, A.M. Kuchling wrote: > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. I totally agree. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20050212/74657c79/attachment.pgp From rkern at ucsd.edu Sat Feb 12 15:11:17 2005 From: rkern at ucsd.edu (Robert Kern) Date: Sat Feb 12 15:11:43 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: A.M. Kuchling wrote: > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > >>Are there any potential problems with making the md5sum module availability >>"optional" in the same way as this? > > > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. > > I'm happy to replace the MD5 and/or SHA implementations with other > code, provided other code with a suitable license can be found. How about this one: http://sourceforge.net/project/showfiles.php?group_id=42360 From an API standpoint, it's trivially different from the one currently in Python. From md5.c: /* Copyright (C) 1999, 2000, 2002 Aladdin Enterprises. All rights reserved. This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. L. Peter Deutsch ghost@aladdin.com */ /* $Id: md5.c,v 1.6 2002/04/13 19:20:28 lpd Exp $ */ /* Independent implementation of MD5 (RFC 1321). This code implements the MD5 Algorithm defined in RFC 1321, whose text is available at http://www.ietf.org/rfc/rfc1321.txt The code is derived from the text of the RFC, including the test suite (section A.5) but excluding the rest of Appendix A. It does not include any code or documentation that is identified in the RFC as being copyrighted. [etc.] -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From aahz at pythoncraft.com Sat Feb 12 15:53:26 2005 From: aahz at pythoncraft.com (Aahz) Date: Sat Feb 12 15:53:29 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <420D5731.8020702@v.loewis.de> Message-ID: <20050212145326.GA7836@panix.com> On Sat, Feb 12, 2005, Terry Reedy wrote: > ""Martin v. Löwis"" wrote in message > news:420D5731.8020702@v.loewis.de... >> >> http://www.python.org/psf/contrib.html > > After reading this page and pages linked thereto, I get the impression that > you are only asking for contributor forms from contributors of original > material (such as module or manual section) and not from submitters of > suggestions (via news,mail) or patches (via sourceforge). Correct? Half-correct: patches constitute "work" and should also require a contrib agreement. But we're probably not going to press the point until we get contrib agreements from all CVS committers. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From tjreedy at udel.edu Sat Feb 12 21:30:42 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Feb 12 21:30:59 2005 Subject: [Python-Dev] Re: Re: license issues with profiler.py and md5.h/md5c.c References: <420D5731.8020702@v.loewis.de> <20050212145326.GA7836@panix.com> Message-ID: "Aahz" wrote in message news:20050212145326.GA7836@panix.com... On Sat, Feb 12, 2005, Terry Reedy wrote: >>> http://www.python.org/psf/contrib.html >> After reading this page and pages linked thereto, I get the impression >> that >> you are only asking for contributor forms from contributors of original >> material (such as module or manual section) and not from submitters of >> suggestions (via news,mail) or patches (via sourceforge). Correct? > Half-correct: patches constitute "work" and should also require a > contrib agreement. As I remember, my impression was based on the suggested procedure of first copywrite one's work and then license it under one of two acceptible "original licenses". This makes sense for a whole module, but hardly for most patches, to the point of being nonsense for a patch of one word, as some of mine have been (in text form, with the actual diff being prepared by the committer). This is not to deny that editing -- finding the exact place to insert or change a word is "work" -- but to say that it is work of a different sort from original authorship. So, if the lawyer thinks patches should also have a contrib agreement, then I strongly recommend a separate blanket agreement that covers all patches one ever contributes as one ongoing work. > But we're probably not going to press the point > until we get contrib agreements from all CVS committers. Even though I am not such, I would happily fill and fax a blanket patch agreement were that deemed to be helpful. Terry J. Reedy From greg at electricrain.com Sat Feb 12 22:04:02 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Feb 12 22:04:08 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: <20050212210402.GE25441@zot.electricrain.com> On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote: > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > > Are there any potential problems with making the md5sum module availability > > "optional" in the same way as this? > > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. > > I'm happy to replace the MD5 and/or SHA implementations with other > code, provided other code with a suitable license can be found. > agreed. it can not be made optional. What I'd prefer (and will do if i find the time) is to have the md5 and sha1 module use OpenSSLs implementations when available. Falling back to their built in ones when openssl isn't present. That way its always there but uses the much faster optimized openssl algorithms when they exist. -g From david.ascher at gmail.com Sat Feb 12 22:42:01 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 12 22:42:05 2005 Subject: [Python-Dev] Jim Roskind Message-ID: I contacted Jim Roskind re: the profiler code. i said: I'm a strong supporter of Opensource software, but I'm probably not going to be able to help you very much. I could be much more helpful with understanding the code or its use ;-). To summarize what I'll say: I don't own the rights to this stuff. ... but I don't believe there are any patents that I was ever involved with that might encumber this work. I would note that my profiler code is really very rarely used in commercial products, and it is much more typically used by developers (I guess a developer toolkit, if sold, would use it). I'm pretty delighted that the code has found so much use by developers over the years. As I noted in the intro to the documentation, I had only been coding in Python for 3 weeks when I wrote it. On the positive side, it exposed many weaknesses in many developer's code (including our own at InfoSeek), as well as in core Python code (subtle bugs in the interpreter) that surely helped everyone. Even though I was a newbie, It was VERY carefully crafted,, and I'd expect that it would take a fair amount of effort to reproduce it (and that is is probably why it has not been changed much... or at least no one told me when they changed/fixed it ;-) ). With regard to why I probably can't help much..... First off, InfoSeek (holder of the copyright) was bought by Disney, and I don't know what if anything has eventually become of the tradename. There is a chance that Disney owns the rights... and I have no idea who to ask there :-/. Second, I took a look at the Copyright, and it sure seems pretty permissive. I'm amazed if folks want something more permissive. This is what I found on the web for it: Copyright ? 1994, by InfoSeek Corporation, all rights reserved. Written by James Roskind.10.1 Permission to use, copy, modify, and distribute this Python software and its associated documentation for any purpose (subject to the restriction in the following sentence) without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of InfoSeek not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. This permission is explicitly restricted to the copying and modification of the software to remain in Python, compiled Python, or other languages (such as C) wherein the modified or derived code is exclusively imported into a Python module. INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. As I recall, I probably personally created the terms of the above license. I used a similar license on my C/C++ grammar, and Infoseek just added a bunch of wording to be sure that they were not at risk, and that their name would not be used in vain (or in advertising material). I think they were also interested in limiting its use to Python.... but I don't think that is a concern that would bother you. I read the link you directed me to, and its primary focus seemed ot be on patents for related or included technology. I don't believe that infoseek applied for or got any patents in this area (and certainly if they did so without my name, it would probably invalidate the patent), and I'm sure I didn't get any patents in this area at Netscape/AOL. In fact I don't think I got any patents back in 1994 or 1995. My only prior patent dated back to about 1983 (a hardware patent) that has since expired. I have some patents since (roughly) 1995, and even though I don't think any of them relate to profiling (though some did relate to languages, or more specifically, security in languages), I wouldn't want to mess with assigning rights to any of those patents, as they belong to AOL/Netscape. Here again, to my knowledge, none of my patents relate in any way to this area (profiling). Sadly, if they did, I would not have the right to assign them. I'm sure you're just doing your job, and following through by dotting all the I's and crossing all T's. My suggestion is to (as you said) work around the issue. You could always re-write the code from scratch, as the approaches are not rocket science and are pretty thoroughly explained. I wouldn't suggest it unless you are desperate. If I were you, I'd wait for a license problem to emerge (which I don't believe will ever happen). Hope that helps, Jim David Ascher wrote on 2/11/2005, 8:57 PM: > Dear Jim -- > > David Ascher here, writing to you on behalf of the Python Software > Foundation. Someone recently pointed to your copyright statement in > Python's standard library (profile.py, if you recall, way back from > '94). Apparently there are some issues re: the specific terms of the > license you picked. We can probably find ways of working around those > issues but I was wondering if you'd be willing to relicense the code > under a different license, as per http://www.python.org/psf/contrib.html > > I don't really know if we need to worry about the current owners of > InfoSeek, whoever that may be. You'd know better. From david.ascher at gmail.com Sat Feb 12 22:45:54 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 12 22:45:57 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: On Tue, 8 Feb 2005 15:52:29 -0500, Jeremy Hylton wrote: > Maybe some ambitious PSF activitst could contact Roskind and Steve > Kirsch and see if they know who at Disney to talk to... Or maybe the > Disney guys who were at PyCon last year could help. I contacted Jim. His response follows: --- I'm a strong supporter of Opensource software, but I'm probably not going to be able to help you very much. I could be much more helpful with understanding the code or its use ;-). To summarize what I'll say: I don't own the rights to this stuff. ... but I don't believe there are any patents that I was ever involved with that might encumber this work. I would note that my profiler code is really very rarely used in commercial products, and it is much more typically used by developers (I guess a developer toolkit, if sold, would use it). I'm pretty delighted that the code has found so much use by developers over the years. As I noted in the intro to the documentation, I had only been coding in Python for 3 weeks when I wrote it. On the positive side, it exposed many weaknesses in many developer's code (including our own at InfoSeek), as well as in core Python code (subtle bugs in the interpreter) that surely helped everyone. Even though I was a newbie, It was VERY carefully crafted,, and I'd expect that it would take a fair amount of effort to reproduce it (and that is is probably why it has not been changed much... or at least no one told me when they changed/fixed it ;-) ). With regard to why I probably can't help much..... First off, InfoSeek (holder of the copyright) was bought by Disney, and I don't know what if anything has eventually become of the tradename. There is a chance that Disney owns the rights... and I have no idea who to ask there :-/. Second, I took a look at the Copyright, and it sure seems pretty permissive. I'm amazed if folks want something more permissive. This is what I found on the web for it: Copyright ? 1994, by InfoSeek Corporation, all rights reserved. Written by James Roskind.10.1 Permission to use, copy, modify, and distribute this Python software and its associated documentation for any purpose (subject to the restriction in the following sentence) without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of InfoSeek not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. This permission is explicitly restricted to the copying and modification of the software to remain in Python, compiled Python, or other languages (such as C) wherein the modified or derived code is exclusively imported into a Python module. INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. As I recall, I probably personally created the terms of the above license. I used a similar license on my C/C++ grammar, and Infoseek just added a bunch of wording to be sure that they were not at risk, and that their name would not be used in vain (or in advertising material). I think they were also interested in limiting its use to Python.... but I don't think that is a concern that would bother you. I read the link you directed me to, and its primary focus seemed ot be on patents for related or included technology. I don't believe that infoseek applied for or got any patents in this area (and certainly if they did so without my name, it would probably invalidate the patent), and I'm sure I didn't get any patents in this area at Netscape/AOL. In fact I don't think I got any patents back in 1994 or 1995. My only prior patent dated back to about 1983 (a hardware patent) that has since expired. I have some patents since (roughly) 1995, and even though I don't think any of them relate to profiling (though some did relate to languages, or more specifically, security in languages), I wouldn't want to mess with assigning rights to any of those patents, as they belong to AOL/Netscape. Here again, to my knowledge, none of my patents relate in any way to this area (profiling). Sadly, if they did, I would not have the right to assign them. I'm sure you're just doing your job, and following through by dotting all the I's and crossing all T's. My suggestion is to (as you said) work around the issue. You could always re-write the code from scratch, as the approaches are not rocket science and are pretty thoroughly explained. I wouldn't suggest it unless you are desperate. If I were you, I'd wait for a license problem to emerge (which I don't believe will ever happen). --- FWIW, I agree. Personnally, I think that if Debian has a problem with the above, it's their problem to deal with, not Python's. --david From rkern at ucsd.edu Sun Feb 13 00:24:27 2005 From: rkern at ucsd.edu (Robert Kern) Date: Sun Feb 13 00:24:50 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: David Ascher wrote: > FWIW, I agree. Personnally, I think that if Debian has a problem with > the above, it's their problem to deal with, not Python's. The OSI may also have a problem with the license if they were to be made aware of it. See section 8 of the Open Source Definition: """8. License Must Not Be Specific to a Product The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution. """ I'm not entirely sure if this affects the PSF's use of OSI's trademark. IANAL. TINLA. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From greg at electricrain.com Sun Feb 13 02:35:35 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Feb 13 02:35:39 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> Message-ID: <20050213013535.GF25441@zot.electricrain.com> I've created an OpenSSL version of the sha module. trivial to modify to be a md5 module. Its a first version with cleanup to be done and such. being managed in the SF patch manager: https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 enjoy. i'll do more cleanup and work on it soon. From martin at v.loewis.de Sun Feb 13 20:38:47 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Feb 13 20:38:50 2005 Subject: [Python-Dev] Re: Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <420D5731.8020702@v.loewis.de> <20050212145326.GA7836@panix.com> Message-ID: <420FACC7.9020502@v.loewis.de> Terry Reedy wrote: > As I remember, my impression was based on the suggested procedure of first > copywrite one's work and then license it under one of two acceptible > "original licenses". This makes sense for a whole module, but hardly for > most patches, to the point of being nonsense for a patch of one word, as > some of mine have been (in text form, with the actual diff being prepared > by the committer). To my understanding, there is no way to "copyright one's work" - in the terminology of Larry Rosen (and I guess U.S. copyright law), "copyright subsists". I.e. the creator of some work has copyright, whether he wants it or not. Now, the question is, what precisely constitutes "work"? To my understanding, modifying an existing work creates derivative work; he who creates the derivative work first needs a license to do so, and then owns the title of the derivative work. There is, of course, the issue of trivial changes - "nobody could have it done differently". However, I understand that the bar for trivial changes is very, very low; I understand that even putting a comment into the change indicating what the change was already makes this original work. Nobody is obliged to phrase the comment in precisely the same way, so this specific wording of the comment is original work of the contributor, who needs to license the change to us. > So, if the lawyer thinks patches should also have a contrib agreement, then > I strongly recommend a separate blanket agreement that covers all patches > one ever contributes as one ongoing work. Our contributor's form is such a blanket agreement. You fill it out once, and then you indicate, in each patch, that this patch falls under the agreement you sent in earlier. > Even though I am not such, I would happily fill and fax a blanket patch > agreement were that deemed to be helpful. When we have sufficient coverage from committers, I will move on to people in Misc/ACKS. You can just go ahead and send in the form right away. Regards, Martin From abo at minkirri.apana.org.au Mon Feb 14 01:02:23 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Mon Feb 14 01:03:02 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <20050213013535.GF25441@zot.electricrain.com> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050213013535.GF25441@zot.electricrain.com> Message-ID: <1108339344.3768.24.camel@schizo> On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote: > I've created an OpenSSL version of the sha module. trivial to modify > to be a md5 module. Its a first version with cleanup to be done and > such. being managed in the SF patch manager: > > https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 > > enjoy. i'll do more cleanup and work on it soon. Hmmm. I see the patch entry, but it seems to be missing the actual patch. Did you code this from scratch, or did you base it on the current md5module.c? Is it using the openssl sha interface, or the higher level EVP interface? The reason I ask is it would be pretty trivial to modify md5module.c to use the openssl API for any digest, and would be less risk than fresh-coding one. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From abo at minkirri.apana.org.au Mon Feb 14 01:19:34 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Mon Feb 14 01:20:12 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212210402.GE25441@zot.electricrain.com> References: <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> Message-ID: <1108340374.3768.33.camel@schizo> G'day, On Sat, 2005-02-12 at 13:04 -0800, Gregory P. Smith wrote: > On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote: > > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > > > Are there any potential problems with making the md5sum module availability > > > "optional" in the same way as this? > > > > The md5 module has been a standard module for a long time; making it > > optional in the next version of Python isn't possible. We'd have to > > require OpenSSL to compile Python. > > > > I'm happy to replace the MD5 and/or SHA implementations with other > > code, provided other code with a suitable license can be found. > > > > agreed. it can not be made optional. What I'd prefer (and will do if > i find the time) is to have the md5 and sha1 module use OpenSSLs > implementations when available. Falling back to their built in ones > when openssl isn't present. That way its always there but uses the > much faster optimized openssl algorithms when they exist. So we need a fallback md5 implementation for when openssl is not available. The RSA implementation is not usable because it has an unsuitable license. Looking at this licence again, I'm not sure what the problem is. It allows you to freely modify, distribute, etc, with the only limit you must retain the RSA licence blurb. The libmd implementation cannot be used because the author tried to give it away unconditionally, and the lawyers say you can't. (dumb! dumb! dumb! someone needs to figure out a way to systematically get around this kind of stupidity, perhaps have someone in a less legally stupid country claim and re-license free code). The libmd5-rfc sourceforge project implementation looks OK. It needs to be modified to have an API identical to openssl (rename structures/functions). Then setup.py needs to be modified to use openssl if available, or fallback to the provided libmd5-rfc implementation. The SHA module is a bit different... it includes a built in SHA implementation. It might pay to strip out the implementation and give it an openssl-like API, then make shamodule.c a use it, or openssl if available. Greg Smith might have already done much of this... -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From greg at electricrain.com Mon Feb 14 01:21:54 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Mon Feb 14 01:21:59 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <1108339344.3768.24.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050213013535.GF25441@zot.electricrain.com> <1108339344.3768.24.camel@schizo> Message-ID: <20050214002154.GI25441@zot.electricrain.com> On Mon, Feb 14, 2005 at 11:02:23AM +1100, Donovan Baarda wrote: > On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote: > > I've created an OpenSSL version of the sha module. trivial to modify > > to be a md5 module. Its a first version with cleanup to be done and > > such. being managed in the SF patch manager: > > > > https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 > > > > enjoy. i'll do more cleanup and work on it soon. > > Hmmm. I see the patch entry, but it seems to be missing the actual > patch. > > Did you code this from scratch, or did you base it on the current > md5module.c? Is it using the openssl sha interface, or the higher level > EVP interface? > > The reason I ask is it would be pretty trivial to modify md5module.c to > use the openssl API for any digest, and would be less risk than > fresh-coding one. Ugh. Sourceforge ignored it on the patch submission. i've attached it properly now. This initial version is derived from shamodule.c which does not have any license issues. it is currently only meant as an example of how easy it is to use the openssl hashing interface. I'm taking it an turning it into a generic openssl hash wrapper that'll do md5 sha1 and anything else. -g From ncoghlan at iinet.net.au Mon Feb 14 03:26:44 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Mon Feb 14 03:27:57 2005 Subject: [Python-Dev] A hybrid C & Python implementation for itertools Message-ID: <42100C64.5090001@iinet.net.au> I can't really imagine Raymond liking this idea, and I have a feeling the idea has been shot down before. However, I can't persuade Google to tell me anything about such an occasion, so here goes anyway. . . The utilities in the itertools module can easily be composed to provide additional useful functionality (e.g. the itertools recipes given in the documentation [1]). However, having to recode these every time you need them, or arranging access to a utility module can be a pain for application programming in some corporate environments [2]. The lack of builtin support also leads to many variations on a theme, only some of which actually work properly, or which work, but in subtly different ways [3]. On the other hand, it really isn't worth the effort to code these algorithms in C for the current itertools module. If itertools was a hybrid module, the handy 3-4 liners could go in the Python section, with the heavy lifting done by the underlying C module. The Python equivalents to the current C code could also be placed in the hybrid module (as happens with some of the other hybrid modules in the library). An alternative approach is based on an idea from Alex Martelli [4]. As Alex points out, itertools is currently more about *creating* iterators than it is about consuming them (the only function desription that doesn't start with 'Make an iterator' is itertools.tee and that starts with 'Return n independent iterators'). Alex's idea would involve adding a module with a new name that is focused on *consuming* iterators (IOW, extending the available standard accumulators beyond the existing min(), max() and sum() without further populating the builtins). The downside of the latter proposal is that the recipes in the itertools documentation relate both to producing *and* consuming iterators, so a new module would leave the question of where to put the handy iterator producers. Regards, Nick. [1] http://www.python.org/dev/doc/devel/lib/itertools-recipes.html [2] http://mail.python.org/pipermail/python-list/2005-February/266310.html [3] http://mail.python.org/pipermail/python-list/2005-February/266311.html [4] http://groups-beta.google.com/group/comp.lang.python/msg/a76b4c2caf6c435c -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From python at rcn.com Mon Feb 14 05:07:10 2005 From: python at rcn.com (Raymond Hettinger) Date: Mon Feb 14 05:11:03 2005 Subject: [Python-Dev] A hybrid C & Python implementation for itertools References: <42100C64.5090001@iinet.net.au> Message-ID: <006e01c5124a$a81ed540$5e2dc797@oemcomputer> [Nick Coghlan] > If itertools was a hybrid module, the handy 3-4 liners could go in the Python > section, with the heavy lifting done by the underlying C module. The Python > equivalents to the current C code could also be placed in the hybrid module (as > happens with some of the other hybrid modules in the library). Both of those ideas likely reflect the future direction of itertools. FWIW, the historical reasons for keeping the derived tools in the docs were: * Not casting them in stone too early so they could be updated and refined at will. * They had more value as a teaching tool (showing how basic tools could be combined) than as stand-alone tools. * Adding more tools makes the whole toolset harder to use. * When an itertool solution is not immediately obvious, then a generator solution is likely to be easier to write and more understandable. Your two alternate partitioning recipes provide an excellent case in point. * Several of the derived tools do not arise often in practice. For example, I've never used tabulate(), nth(), pairwise(), or repeatfunc(). > Alex's idea would involve adding a module with a new name that is > focused on *consuming* iterators (IOW, extending the available standard > accumulators beyond the existing min(), max() and sum() without further > populating the builtins). That would be nice. From the existing itertool recipes, good candidates would include take(), all(), any(), no(), and quantify(). Raymond From just at letterror.com Mon Feb 14 10:23:03 2005 From: just at letterror.com (Just van Rossum) Date: Mon Feb 14 10:23:06 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libimp.tex, 1.36, 1.36.2.1 libsite.tex, 1.26, 1.26.4.1 libtempfile.tex, 1.22, 1.22.4.1 libos.tex, 1.146.2.1, 1.146.2.2 In-Reply-To: Message-ID: bcannon@users.sourceforge.net wrote: > \begin{datadesc}{PY_RESOURCE} > -The module was found as a Macintosh resource. This value can only be > -returned on a Macintosh. > +The module was found as a Mac OS 9 resource. This value can only be > +returned on a Mac OS 9 or earlier Macintosh. > \end{datadesc} not entirely true: it's limited to the sa called "OS9" version of MacPython, which happily runs natively on OSX as a Carbon app... Just From troels at thule.no Mon Feb 14 15:03:22 2005 From: troels at thule.no (Troels Walsted Hansen) Date: Mon Feb 14 15:03:28 2005 Subject: [Python-Dev] builtin_id() returns negative numbers Message-ID: <4210AFAA.9060108@thule.no> Hi all, The Python binding in libxml2 uses the following code for __repr__(): class xmlNode(xmlCore): def __init__(self, _obj=None): self._o = None xmlCore.__init__(self, _obj=_obj) def __repr__(self): return " " % (self.name, id (self)) With Python 2.3.4 I'm seeing warnings like the one below: :2357: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up I believe this is caused by the memory address having the sign bit set, causing builtin_id() to return a negative integer. I grepped around in the Python standard library and found a rather awkward work-around that seems to be slowly propagating to various module using the "'%x' % id(self)" idiom: Lib/asyncore.py: # On some systems (RH10) id() can be a negative number. # work around this. MAX = 2L*sys.maxint+1 return '<%s at %#x>' % (' '.join(status), id(self)&MAX) $ grep -r 'can be a negative number' * Lib/asyncore.py: # On some systems (RH10) id() can be a negative number. Lib/repr.py: # On some systems (RH10) id() can be a negative number. Lib/tarfile.py: # On some systems (RH10) id() can be a negative number. Lib/test/test_repr.py: # On some systems (RH10) id() can be a negative number. Lib/xml/dom/minidom.py: # On some systems (RH10) id() can be a negative number. There are many modules that do not have this work-around in Python 2.3.4. Wouldn't it be more elegant to make builtin_id() return an unsigned long integer? Is the performance impact too great? A long integer is used on platforms where SIZEOF_VOID_P > SIZEOF_LONG (most 64 bit platforms?), so all Python code must be prepared to handle it already... Troels From tim.peters at gmail.com Mon Feb 14 16:41:35 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 16:41:37 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <4210AFAA.9060108@thule.no> References: <4210AFAA.9060108@thule.no> Message-ID: <1f7befae050214074122b715a@mail.gmail.com> [Troels Walsted Hansen] > The Python binding in libxml2 uses the following code for __repr__(): > > class xmlNode(xmlCore): > def __init__(self, _obj=None): > self._o = None > xmlCore.__init__(self, _obj=_obj) > > def __repr__(self): > return " " % (self.name, id (self)) > > With Python 2.3.4 I'm seeing warnings like the one below: > :2357: FutureWarning: %u/%o/%x/%X of negative int > will return a signed string in Python 2.4 and up > > I believe this is caused by the memory address having the sign bit set, > causing builtin_id() to return a negative integer. Yes, that's right. > I grepped around in the Python standard library and found a rather > awkward work-around that seems to be slowly propagating to various > module using the "'%x' % id(self)" idiom: No, it's not propagating any more: I see that none of these exist in 2.4: > Lib/asyncore.py: > # On some systems (RH10) id() can be a negative number. > # work around this. > MAX = 2L*sys.maxint+1 > return '<%s at %#x>' % (' '.join(status), id(self)&MAX) > > $ grep -r 'can be a negative number' * > Lib/asyncore.py: # On some systems (RH10) id() can be a negative > number. > Lib/repr.py: # On some systems (RH10) id() can be a negative > number. > Lib/tarfile.py: # On some systems (RH10) id() can be a negative > number. > Lib/test/test_repr.py: # On some systems (RH10) id() can be a > negative number. > Lib/xml/dom/minidom.py: # On some systems (RH10) id() can be a > negative number. > > There are many modules that do not have this work-around in Python 2.3.4. Not sure, but it looks like this stuff was ripped out in 2.4 simply because 2.4 no longer produces a FutureWarning in these cases. That doesn't address that the output changed, or that the output for a negative id() produced by %x under 2.4 is probably surprising to most. > Wouldn't it be more elegant to make builtin_id() return an unsigned > long integer? I think so. This is the function ZODB 3.3 uses, BTW: # Addresses can "look negative" on some boxes, some of the time. If you # feed a "negative address" to an %x format, Python 2.3 displays it as # unsigned, but produces a FutureWarning, because Python 2.4 will display # it as signed. So when you want to prodce an address, use positive_id() to # obtain it. def positive_id(obj): """Return id(obj) as a non-negative integer.""" result = id(obj) if result < 0: # This is a puzzle: there's no way to know the natural width of # addresses on this box (in particular, there's no necessary # relation to sys.maxint). Try 32 bits first (and on a 32-bit # box, adding 2**32 gives a positive number with the same hex # representation as the original result). result += 1L << 32 if result < 0: # Undo that, and try 64 bits. result -= 1L << 32 result += 1L << 64 assert result >= 0 # else addresses are fatter than 64 bits return result The gives a non-negative result regardless of Python version and (almost) regardless of platform (the `assert` hasn't triggered on any ZODB 3.3 platform yet). > Is the performance impact too great? For some app, somewhere, maybe. It's a tradeoff. The very widespread practice of embedding %x output from id() favors getting rid of the sign issue, IMO. > A long integer is used on platforms where SIZEOF_VOID_P > SIZEOF_LONG > (most 64 bit platforms?), Win64 is probably the only major (meaning likely to be popular among Python users) platform where sizeof(void*) > sizeof(long). > so all Python code must be prepared to handle it already... In theory . From foom at fuhm.net Mon Feb 14 17:33:13 2005 From: foom at fuhm.net (James Y Knight) Date: Mon Feb 14 17:33:25 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050214074122b715a@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> Message-ID: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> On Feb 14, 2005, at 10:41 AM, Tim Peters wrote: >> Wouldn't it be more elegant to make builtin_id() return an unsigned >> long integer? > > I think so. This is the function ZODB 3.3 uses, BTW: > > def positive_id(obj): > """Return id(obj) as a non-negative integer.""" > [...] I think it'd be nice to change it, too. Twisted also uses a similar function. However, last time this topic came up, this Tim Peters guy argued against it. ;) Quoting http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > Python doesn't promise to return a postive integer for id(), although > it may have been nicer if it did. It's dangerous to change that now, > because some code does depend on the "32 bit-ness as a signed integer" > accident of CPython's id() implementation on 32-bit machines. For > example, code using struct.pack(), or code using one of ZODB's > specialized int-key BTree types with id's as keys. James From tim.peters at gmail.com Mon Feb 14 18:30:46 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 18:30:49 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <1f7befae05021409307ab36a15@mail.gmail.com> [James Y Knight] > I think it'd be nice to change it, too. Twisted also uses a similar > function. > > However, last time this topic came up, this Tim Peters guy argued > against it. ;) > > Quoting > http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > >> Python doesn't promise to return a postive integer for id(), although >> it may have been nicer if it did. It's dangerous to change that now, >> because some code does depend on the "32 bit-ness as a signed integer" >> accident of CPython's id() implementation on 32-bit machines. For >> example, code using struct.pack(), or code using one of ZODB's >> specialized int-key BTree types with id's as keys. Yup, it's still a tradeoff, and it's still dangerous (as any change in visible behavior is). It's especially unfortunate that since "%x" % id(obj) does produce different output in 2.4 than in 2.3 when id(obj) < 0, we would change that output _again_ in 2.5 if id(obj) grew a new non-negative promise. That is, the best time to do this would have been for 2.4. Maybe it's just a wart we have to live with now; OTOH, the docs explicitly warn that id() may return a long, so any code relying on "short int"-ness has always been relying on an implementation quirk. From jcarlson at uci.edu Mon Feb 14 18:29:57 2005 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon Feb 14 18:32:21 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> References: <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <20050214092543.36F0.JCARLSON@uci.edu> James Y Knight wrote: > > > On Feb 14, 2005, at 10:41 AM, Tim Peters wrote: > > >> Wouldn't it be more elegant to make builtin_id() return an unsigned > >> long integer? > > > > I think so. This is the function ZODB 3.3 uses, BTW: > > > > def positive_id(obj): > > """Return id(obj) as a non-negative integer.""" > > [...] > > I think it'd be nice to change it, too. Twisted also uses a similar > function. > > However, last time this topic came up, this Tim Peters guy argued > against it. ;) > > Quoting > http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > > > Python doesn't promise to return a postive integer for id(), although > > it may have been nicer if it did. It's dangerous to change that now, > > because some code does depend on the "32 bit-ness as a signed integer" > > accident of CPython's id() implementation on 32-bit machines. For > > example, code using struct.pack(), or code using one of ZODB's > > specialized int-key BTree types with id's as keys. All Tim was saying is that you can't /change/ builtin_id() because of backwards compatibiliity with Zope and struct.pack(). You are free to create a positive_id() function, and request its inclusion into builtins (low probability; people don't like doing that). Heck, you are even free to drop it in your local site.py implementation. But changing the current function is probably a no-no. - Josiah From tim.peters at gmail.com Mon Feb 14 20:13:57 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 20:14:01 2005 Subject: [Python-Dev] Re: [Zope] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <1f7befae050214111319abbda@mail.gmail.com> [Gfeller Martin] > I'm running a large Zope application on a 1x1GHz CPU 1GB mem > Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 > The application typically builds large lists by appending > and extending them. That's historically been an especially bad case for Windows systems, although the behavior varied across specific Windows flavors. Python has changed lots of things over time to improve it, including yet another twist on list-reallocation strategy new in Python 2.4. > We regularly observed that using a given functionality a > second time using the same process was much slower (50%) > than when it ran the first time after startup. Heh. On Win98SE, the _first_ time you ran pystone after rebooting the machine, it ran twice as fast as the second (or third, fourth, ...) time you tried it. The only way I ever found to get back the original speed without a reboot was to run a different process in-between that allocated almost all physical memory in one giant chunk. Presumably that convinced Win98SE to throw away its fragmented heap and start over again. > This behavior greatly improved with Python 2.3 (thanks > to the improved Python object allocator, I presume). The page you reference later describes a scheme that's (at least superficially) a lot like pymalloc uses for "small objects". In effect, pymalloc takes over buckets 1-32 in the table. > Nevertheless, I tried to convert the heap used by Python > to a Windows Low Fragmentation Heap (available on XP > and 2003 Server). This improved the overall run time > of a typical CPU-intensive report by about 15% > (overall run time is in the 5 minutes range), with the > same memory consumption. > > I consider 15% significant enough to let you know about it. Yes, and thank you. FYI, Python doesn't call any of the Win32 heap functions directly; the behavior it sees is inherited from whatever Microsoft's C implementation uses to support C's malloc()/realloc()/free(). pymalloc requests 256KB at a time from the platform malloc, and carves it up itself, so pymalloc isn't affected by LFH (LFH punts on requests over 16KB, much as pymalloc punts on requests over 256 bytes). But "large objects" (including list guts) don't go thru pymalloc to begin with, so as long as your list guts fit in 16KB, LFH could make a real difference to how they behave. Well, actually, it's probably more the case that LFH gives a boost by keeping small objects _out_ of the general heap. Then growing a giant list doesn't bump into gazillions of small objects. > For information about the Low Fragmentation Heap, see > > > Best regards, > Martin > > PS: Since I don't speak C, I used ctypes to convert all > heaps in the process to LFH (I don't know how to determine > which one is the C heap). It's the one consuming all the time . From tismer at stackless.com Tue Feb 15 01:38:43 2005 From: tismer at stackless.com (Christian Tismer) Date: Tue Feb 15 01:38:36 2005 Subject: [Python-Dev] Ann: PyPy Sprint before PYCON 2005 in Washington Message-ID: <42114493.8050006@stackless.com> PyPy Sprint before PYCON 2005 in Washington ------------------------------------------- In the four days from 19th March till 22th March (inclusive) the PyPy team will host a sprint on their new Python-in-Python implementation. The PyPy project was granted funding by the European Union as part of its Sixth Framework Program, and is now on track to produce a stackless Python-in-Python Just-in-Time Compiler by December 2006. Our Python implementation, released under the MIT/BSD license, already provides new levels of flexibility and extensibility at the core interpreter and object implementation level. Armin Rigo and Holger Krekel will also give talks about PyPy and the separate py.test tool (used to perform various kinds of testing in PyPy) during the conference. Naturally, we are eager to see how the other re-implementation of Python, namely IronPython, is doing and to explore collaboration possibilities. Of course, that will depend on the degree of openness that Microsoft wants to employ. The Pycon2005 sprint is going to focus on reaching compatibility with CPython (currently we target version 2.3.4) for our PyPy version running on top of CPython. One goal of the sprint is to pass 60% or more of the unmodified regression tests of mainline CPython. It will thus be a great way to get to know CPython and PyPy better at the same time! Other possible work areas include: - translation to C to get a first working lower-level representation of the interpreter "specified in Python" - integrating and implementing a full parser/compiler chain written in Python maybe already targetting the new AST-branch of mainline CPython - fixing various remaining issues that will come up while trying to reach the compatibility goal - integrate or code pure python implementations of some Python modules currently written in C. - whatever issues you come up with! (please tell us before hand so we can better plan introductions etc.pp.) Besides core developers, Bea D?ring will be present to help improving and document our sprint and agile development process. We are going to give tutorials about PyPy's basic concepts and provide help to newcomers usually by pairing them with experienced pypythonistas. However, we kindly ask newcomers to be present on the first day's morning (19th of March) of the sprint to be able to get everyone a smooth start into the sprint. So far most newcomers had few problems in getting a good start into our codebase. However, it is good to have the following preparational points in mind: - some experience with programming in the Python language and interest to dive deeper - subscription to pypy-dev and pypy-sprint at http://codespeak.net/pypy/index.cgi?lists - have a subversion-client, Pygame and graphviz installed on the machine you bring to the sprint. - have a look at our current documentation, especially the architecture and getting-started documents under http://codespeak.net/pypy/index.cgi?doc The pypy-dev and pypy-sprint lists are also the contact points for raising questions and suggesting and discussing sprint topics beforehand. We are on #pypy on irc.freenode.net most of the time. Please don't hesitate to contact us or introduce yourself and your interests! Logistics --------- Organizational details will be posted to pypy-sprint and are or will be available in the Pycon2005-Sprint wiki here: http://www.python.org/moin/PyConDC2005/Sprints Registration ------------ send mail to pypy-sprint@codespeak.net, stating the days you can be present and any specific interests if applicable. Registered Participants ----------------------- all days: Jacob Hall?n Armin Rigo Holger Krekel Samuele Pedroni Anders Chrigstr?m Bea D?ring Christian Tismer Richard Emslie -- Christian Tismer :^) tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ncoghlan at iinet.net.au Tue Feb 15 10:43:30 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Tue Feb 15 10:43:33 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <20050214092543.36F0.JCARLSON@uci.edu> References: <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> <20050214092543.36F0.JCARLSON@uci.edu> Message-ID: <4211C442.3010001@iinet.net.au> Josiah Carlson wrote: >>Quoting >>http://mail.python.org/pipermail/python-dev/2004-November/050049.html: >> >> >>>Python doesn't promise to return a postive integer for id(), although >>>it may have been nicer if it did. It's dangerous to change that now, >>>because some code does depend on the "32 bit-ness as a signed integer" >>>accident of CPython's id() implementation on 32-bit machines. For >>>example, code using struct.pack(), or code using one of ZODB's >>>specialized int-key BTree types with id's as keys. > > > All Tim was saying is that you can't /change/ builtin_id() because of > backwards compatibiliity with Zope and struct.pack(). You are free to > create a positive_id() function, and request its inclusion into builtins > (low probability; people don't like doing that). Heck, you are even free > to drop it in your local site.py implementation. But changing the > current function is probably a no-no. There's always the traditional response to "want to fix it but can't due to backwards compatibility": a keyword argument that defaults to False. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From fredrik at pythonware.com Tue Feb 15 10:56:58 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 15 10:57:26 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: James Y Knight wrote: > However, last time this topic came up, this Tim Peters guy argued against it. ;) > > Quoting http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > >> Python doesn't promise to return a postive integer for id(), although >> it may have been nicer if it did. It's dangerous to change that now, >> because some code does depend on the "32 bit-ness as a signed integer" >> accident of CPython's id() implementation on 32-bit machines. For >> example, code using struct.pack(), or code using one of ZODB's >> specialized int-key BTree types with id's as keys. can anyone explain the struct.pack and ZODB use cases? the first one doesn't make sense to me, and the other relies on Python *not* behaving as documented (which is worse than relying on undocumented behaviour, imo). From fredrik at pythonware.com Tue Feb 15 13:47:35 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 15 13:47:51 2005 Subject: [Python-Dev] pymalloc on 2.1.3 Message-ID: does anyone remember if there were any big changes in pymalloc between the 2.1 series (where it was introduced) and 2.3 (where it was enabled by default). or in other words, is the 2.1.3 pymalloc stable enough for production use? (we're having serious memory fragmentation problems on a 2.1.3 system, and while I can patch/rebuild the interpreter if necessary, we cannot update the system right now...) From mwh at python.net Tue Feb 15 13:58:19 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 13:58:22 2005 Subject: [Python-Dev] pymalloc on 2.1.3 In-Reply-To: (Fredrik Lundh's message of "Tue, 15 Feb 2005 13:47:35 +0100") References: Message-ID: <2mmzu60yl0.fsf@starship.python.net> "Fredrik Lundh" writes: > does anyone remember if there were any big changes in pymalloc between > the 2.1 series (where it was introduced) and 2.3 (where it was enabled by > default). Yes. (Was it really 2.1? Time flies!) > or in other words, is the 2.1.3 pymalloc stable enough for production use? Well, Tim posted ways of making it crash, but I don't know how likely they are to occur in non-malicious code. "cvs log Objects/obmalloc.c" might enlighten, or at least give an idea which months of the python-dev archive to search. Cheers, mwh -- this "I hate c++" is so old it's as old as C++, yes -- from Twisted.Quotes From tim.peters at gmail.com Tue Feb 15 15:50:02 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 15:50:05 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers In-Reply-To: References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <1f7befae05021506507964d814@mail.gmail.com> [Fredrik Lundh] > can anyone explain the struct.pack and ZODB use cases? the first one > doesn't make sense to me, Not deep and surely not common, just possible. If you're on a 32-bit box and doing struct.pack("...i...", ... id(obj) ...), it in fact cannot fail now (no, that isn't guaranteed by the docs, it's just an implementation reality), but would fail if id() ever returned a positive long with the same bit pattern as a negative 32-bit int ("OverflowError: long int too large to convert to int").. > and the other relies on Python *not* behaving as documented (which is worse > than relying on undocumented behaviour, imo). I don't know what you think the problem with ZODB's integer-flavored keys might be, then. The problem I'm thinking of is that by "integer-flavored" they really mean *C* int, not Python integer (which is C long). They're delicate enough that way that they already don't work right on most current 64-bit boxes whenever the value of a Python int doesn't in fact fit in the platform's C int: http://collector.zope.org/Zope/1592 If id() returned a long in some cases on 32-bit boxes, then code using id() as key (in an II or IO tree) or value (in an II or OI) tree would stop working. Again, the Python docs didn't guarantee this would work, and the int-flavored BTrees have 64-bit box bugs in their handling of integers, but the id()-as-key-or-value case has nevertheless worked wholly reliably until now on 32-bit boxes. Any change in visible behavior has the potential to break code -- that shouldn't be controversial, because it's so obvious, and so relentlessly proved in real life. It's a tradeoff. I've said I'm in favor of taking away the sign issue for id() in this case, although I'm not going to claim that no code will break as a result, and I'd be a lot more positive about it if we could use the time machine to change this behavior for 2.4. From tim.peters at gmail.com Tue Feb 15 16:19:01 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 16:19:04 2005 Subject: [Python-Dev] pymalloc on 2.1.3 In-Reply-To: References: Message-ID: <1f7befae0502150719a24607d@mail.gmail.com> [Fredrik Lundh] > does anyone remember if there were any big changes in pymalloc between > the 2.1 series (where it was introduced) and 2.3 (where it was enabled by > default). Yes, huge -- few original lines survived exactly, although many survived in intent. > or in other words, is the 2.1.3 pymalloc stable enough for production use? Different question entirely . It _was_ used in production by some people, and happily so. Major differences: + 2.1 used a probabilistic scheme for guessing whether addresses passed to it were obtained from pymalloc or from the system malloc. It was easy for a malicous pure-Python program to corrupt pymalloc and/or malloc internals as a result, leading to things like segfaults, and even sneaky ways to mutate the Python bytecode stream. It's extremely unlikely that a non- malicious program could bump into these. + Horrid hackery went into 2.3's version to cater to broken extension modules that called PyMem functions without holding the GIL. 2.1's may not be as thread-safe in these cases. + 2.1's only fields requests up to 64 bytes, 2.3's up to 256 bytes. Changes in the dict implementation, and new-style classes, for 2.3 made it a pragmatic necessity to boost the limit for 2.3. > (we're having serious memory fragmentation problems on a 2.1.3 system, > and while I can patch/rebuild the interpreter if necessary, we cannot update > the system right now...) I'd give it a shot -- pymalloc has always been very effective at handling large numbers of small objects gracefully. The meaning of "small" got 4x bigger since 2.1, which appeared to be a pure win, but 64 bytes was enough under 2.1 that most small instance dicts fit. From mwh at python.net Tue Feb 15 16:49:49 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 16:49:51 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: <2mu0pebo6u.fsf@starship.python.net> (Michael Hudson's message of "Tue, 18 Jan 2005 18:13:29 +0000") References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> Message-ID: <2mfyzx257m.fsf@starship.python.net> Michael Hudson writes: > Michael Hudson writes: > >> I hope to have a new patch (which makes PyExc_Exception new-style, but >> allows arbitrary old-style classes as exceptions) "soon". It may even >> pass bits of "make test" :) > > Done: http://www.python.org/sf/1104669 Now I think it's really done, apart from documentation. My design decision was to make Exception new-style. Things can be raised if they are instances of old-style classes or instances of Exception. If this meets with general agreement, I'd like to check the above patch in. It will break some highly introspective code, so it's IMO best to get it in early in the 2.5 cycle. The other option is to keep Exception old-style but allow new-style subclasses, but I think all this will do is break the above mentioned introspective code in a quieter way... The patch also updates the PendingDeprecationWarning on raising a string exception to a full DeprecationWarning (something that should be done anyway). Cheers, mwh -- python py.py ~/Source/python/dist/src/Lib/test/pystone.py Pystone(1.1) time for 5000 passes = 19129.1 This machine benchmarks at 0.261381 pystones/second From gvanrossum at gmail.com Tue Feb 15 19:55:53 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 15 19:56:12 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: <2mfyzx257m.fsf@starship.python.net> References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> <2mfyzx257m.fsf@starship.python.net> Message-ID: > My design decision was to make Exception new-style. Things can be > raised if they are instances of old-style classes or instances of > Exception. If this meets with general agreement, I'd like to check > the above patch in. I like it, but didn't you forget to mention that strings can still be raised? I think we can't break that (but we can insert a deprecation warning for this in 2.5 so we can hopefully deprecate it in 2.6, or 2.7 at the latest). > The patch also updates the PendingDeprecationWarning on raising a > string exception to a full DeprecationWarning (something that should > be done anyway). What I said. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh at python.net Tue Feb 15 20:27:23 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 20:27:25 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: (Guido van Rossum's message of "Tue, 15 Feb 2005 10:55:53 -0800") References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> <2mfyzx257m.fsf@starship.python.net> Message-ID: <2mbral1v50.fsf@starship.python.net> Guido van Rossum writes: >> My design decision was to make Exception new-style. Things can be >> raised if they are instances of old-style classes or instances of >> Exception. If this meets with general agreement, I'd like to check >> the above patch in. > > I like it, but didn't you forget to mention that strings can still be > raised? I think we can't break that (but we can insert a deprecation > warning for this in 2.5 so we can hopefully deprecate it in 2.6, or > 2.7 at the latest). I try to forget that as much as possible :) >> The patch also updates the PendingDeprecationWarning on raising a >> string exception to a full DeprecationWarning (something that should >> be done anyway). > > What I said. :-) :) I'll try to bash the documentation into shape next. Cheers, mwh -- please realize that the Common Lisp community is more than 40 years old. collectively, the community has already been where every clueless newbie will be going for the next three years. so relax, please. -- Erik Naggum, comp.lang.lisp From ejones at uwaterloo.ca Tue Feb 15 22:39:38 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Tue Feb 15 22:39:32 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? Message-ID: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> After I finally understood what thread-safety guarantees the Python memory allocator needs to provide, I went and did some hard thinking about the code this afternoon. I believe that my modifications provide the same guarantees that the original version did. I do need to declare the arenas array to be volatile, and leak the array when resizing it. Please correct me if I am wrong, but the situation that needs to be supported is this: While one thread holds the GIL, any other thread can call PyObject_Free with a pointer that was returned by the system malloc. The following situation is *not* supported: While one thread holds the GIL, another thread calls PyObject_Free with a pointer that was returned by PyObject_Malloc. I'm hoping that I got things a little better this time around. I've submitted my updated patch to the patch tracker. For reference, I've included links to SourceForge and the previous thread. Thank you, Evan Jones Previous thread: http://mail.python.org/pipermail/python-dev/2005-January/051255.html Patch location: http://sourceforge.net/tracker/index.php? func=detail&aid=1123430&group_id=5470&atid=305470 From tim.peters at gmail.com Tue Feb 15 23:52:02 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 23:52:05 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> Message-ID: <1f7befae05021514524d0a35ec@mail.gmail.com> [Evan Jones] > After I finally understood what thread-safety guarantees the Python > memory allocator needs to provide, I went and did some hard thinking > about the code this afternoon. I believe that my modifications provide > the same guarantees that the original version did. I do need to declare > the arenas array to be volatile, and leak the array when resizing it. > Please correct me if I am wrong, but the situation that needs to be > supported is this: As I said before, I don't think we need to support this any more. More, I think we should not -- the support code is excruciatingly subtle, it wasted plenty of your time trying to keep it working, and if we keep it in it's going to continue to waste time over the coming years (for example, in the short term, it will waste my time reviewing it). > While one thread holds the GIL, any other thread can call PyObject_Free > with a pointer that was returned by the system malloc. What _was_ supported was more generally that any number of threads could call PyObject_Free with pointers that were returned by the system malloc/realloc at the same time as a single thread, holding the GIL, was doing anything whatsoever (including executing any code inside obmalloc.c) Although that's a misleading way of expressing the actual intent; more on that below. > The following situation is *not* supported: > > While one thread holds the GIL, another thread calls PyObject_Free with > a pointer that was returned by PyObject_Malloc. Right, that was never supported (and I doubt it could be without introducing a new mutex in obmalloc.c). > I'm hoping that I got things a little better this time around. I've > submitted my updated patch to the patch tracker. For reference, I've > included links to SourceForge and the previous thread. > > Thank you, Thank you! I probably can't make time to review anything before this weekend. I will try to then. I expect it would be easier if you ripped out the horrid support for PyObject_Free abuse; in a sane world, the release-build PyMem_FREE, PyMem_Del, and PyMem_DEL would expand to "free" instead of to "PyObject_FREE" (via changes to pymem.h). IOW, it was never the _intent_ that people be able to call PyObject_Free without holding the GIL. The need for that came from a different problem, that old code sometimes mixed calls to PyObject_New with calls to PyMem_DEL (or PyMem_FREE or PyMem_Del). It's for that latter reason that PyMem_DEL (and its synonyms) were changed to expand to PyObject_Free. This shouldn't be supported anymore. Because it _was_ supported, there was no way to tell whether PyObject_Free was being called because (a) we were catering to long-obsolete but once-loved code that called PyMem_DEL while holding the GIL and with a pointer obtained by PyObject_New; or, (b) somebody was calling PyMem_Del (etc) with a non-object pointer they had obtained from PyMem_New, or from the system malloc directly. It was never legit to do #a without holding the GIL. It was clear as mud whether it was legit to do #b without holding the GIL. If PyMem_Del (etc) change to expand to "free" in a release build, then #b can remain clear as mud without harming anyone. Nobody should be doing #a anymore. If someone still is, "tough luck -- fix it, you've had years of warning" is easy for me to live with at this stage. I suppose the other consideration is that already-compiled extension modules on non-Windows(*) systems will, if they're not recompiled, continue to call PyObject_Free everywhere they had a PyMem_Del/DEL/FREE call. If such code is calling it without holding the GIL, and obmalloc.c stops trying to support this insanity, then they're going to grow some thread races they woudn't have if they did recompile (to get such call sites remapped to the system free). I don't really care about that either: it's a general rule that virtually all Python API functions must be called with the GIL held, and there was never an exception in the docs for the PyMem_ family. (*) Windows is immune simply because the Windows Python is set up in such a way that you always have to recompile extension modules when Python's minor version number (the j in i.j.k) gets bumped. From ejones at uwaterloo.ca Wed Feb 16 04:02:53 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Wed Feb 16 04:04:05 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <1f7befae05021514524d0a35ec@mail.gmail.com> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> Message-ID: <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> On Feb 15, 2005, at 17:52, Tim Peters wrote: > As I said before, I don't think we need to support this any more. > More, I think we should not -- the support code is excruciatingly > subtle, it wasted plenty of your time trying to keep it working, and > if we keep it in it's going to continue to waste time over the coming > years (for example, in the short term, it will waste my time reviewing > it). I do not have nearly enough experience in the Python world to evaluate this decision. I've only been programming in Python for about two years now, and as I am sure you are aware, this is my first patch that I have submitted to Python. I don't really know my way around the Python internals, beyond writing basic extensions in C. Martin's opinion is clearly the opposite of yours. Basically, the debate seems to boil down to maintaining backwards compatibility at the cost of making the code in obmalloc.c harder to understand. The particular case that is being supported could definitely be viewed as a "bug" in the code that using obmalloc. It also likely is quite rare. However, until now it has been supported, so it is hard to judge exactly how much code would be affected. It would definitely be a minor barrier to moving to Python 2.5. Is there some sort of consensus that is possible on this issue? >> While one thread holds the GIL, any other thread can call >> PyObject_Free >> with a pointer that was returned by the system malloc. > What _was_ supported was more generally that > > any number of threads could call PyObject_Free with pointers that > were > returned by the system malloc/realloc > > at the same time as > > a single thread, holding the GIL, was doing anything whatsoever > (including > executing any code inside obmalloc.c) Okay, good, that is what I have assumed. > Although that's a misleading way of expressing the actual intent; more > on that below. That's fine. It may be a misleading description of the intent, but it is an accurate description of the required behaviour. At least I hope it is. > I expect it would be easier if you > ripped out the horrid support for PyObject_Free abuse; in a sane > world, the release-build PyMem_FREE, PyMem_Del, and PyMem_DEL would > expand to "free" instead of to "PyObject_FREE" (via changes to > pymem.h). It turns out that basically the only thing that would change would be removing the "volatile" specifiers from two of the global variables, plus it would remove about 100 lines of comments. :) The "work" was basically just hurting my brain trying to reason about the concurrency issues, not changing code. > It was never legit to do #a without holding the GIL. It was clear as > mud whether it was legit to do #b without holding the GIL. If > PyMem_Del (etc) change to expand to "free" in a release build, then #b > can remain clear as mud without harming anyone. Nobody should be > doing #a anymore. If someone still is, "tough luck -- fix it, you've > had years of warning" is easy for me to live with at this stage. Hmm... The issue is that case #a may not be an easy problem to diagnose: Some implementations of free() will happily do nothing if they are passed a pointer they know nothing about. This would just result in a memory leak. Other implementations of free() can output a warning or crash in this case, which would make it trivial to locate. > I suppose the other consideration is that already-compiled extension > modules on non-Windows(*) systems will, if they're not recompiled, > continue to call PyObject_Free everywhere they had a > PyMem_Del/DEL/FREE call. Is it guaranteed that extension modules will be binary compatible with future Python releases? I didn't think this was the case. Thanks for the feedback, Evan Jones From tim.peters at gmail.com Wed Feb 16 05:26:18 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 16 05:26:22 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> Message-ID: <1f7befae05021520263d77a2a3@mail.gmail.com> [Tim Peters] >> As I said before, I don't think we need to support this any more. >> More, I think we should not -- the support code is excruciatingly >> subtle, it wasted plenty of your time trying to keep it working, and >> if we keep it in it's going to continue to waste time over the coming >> years (for example, in the short term, it will waste my time reviewing >> it). [Evan Jones] > I do not have nearly enough experience in the Python world to evaluate > this decision. I've only been programming in Python for about two years > now, and as I am sure you are aware, this is my first patch that I have > submitted to Python. I don't really know my way around the Python > internals, beyond writing basic extensions in C. Martin's opinion is > clearly the opposite of yours. ? This is all I recall Martin saying about this: http://mail.python.org/pipermail/python-dev/2005-January/051265.html I'm not certain it is acceptable to make this assumption. Why is it not possible to use the same approach that was previously used (i.e. leak the arenas array)? Do you have something else in mind? I'll talk with Martin about it if he still wants to. Martin, this miserable code must die! > Basically, the debate seems to boil down to maintaining backwards > compatibility at the cost of making the code in obmalloc.c harder to > understand. The "let it leak to avoid thread problems" cruft is arguably the single most obscure bit of coding in Python's code base. I created it, so I get to say that . Even 100 lines of comments aren't enough to make it clear, as you've discovered. I've lost track of how many hours of my life have been pissed away explaining it, and its consequences (like how come this or that memory-checking program complains about the memory leak it causes), and the historical madness that gave rise to it in the beginning. I've had enough of it -- the only purpose this part ever had was to protect against C code that wasn't playing by the rules anyway. BFD. There are many ways to provoke segfaults with C code that breaks the rules, and there's just not anything that special about this way _except_ that I added objectionable (even at the time) hacks to preserve this kind of broken C code until authors had time to fix it. Time's up. > The particular case that is being supported could definitely be viewed > as a "bug" in the code that using obmalloc. It also likely is quite rare. > However, until now it has been supported, so it is hard to judge exactly > how much code would be affected. People spent many hours searching for affected code when it first went in, and only found a few examples then, in obscure extension modules. It's unlikely usage has grown. The hack was put it in for the dubious benefit of the few examples that were found then. > It would definitely be a minor barrier to moving to Python 2.5. That's in part what python-dev is for. Of course nobody here has code that will break -- but the majority of high-use extension modules are maintained by people who read this list, so that's not as empty as it sounds. It's also what alpha and beta releases are for. Fear of change isn't a good enough reason to maintain this code. > Is there some sort of consensus that is possible on this issue? Absolutely, provided it matches my view <0.5 wink>. Rip it out, and if alpha/beta testing suggests that's a disaster, _maybe_ put it back in. ... > It turns out that basically the only thing that would change would be > removing the "volatile" specifiers from two of the global variables, > plus it would remove about 100 lines of comments. :) The "work" was > basically just hurting my brain trying to reason about the concurrency > issues, not changing code. And the brain of everyone else who ever bumps into this. There's a high probability that if this code actually doesn't work (can you produce a formal proof of correctness for it? I can't -- and I tried), nothing can be done to repair it; and code this outrageously delicate has a decent chance of being buggy no matter how many people stare at it (overlooking that you + me isn't that many). You also mentioned before that removing the "volatile"s may have given a speed boost, and that's believable. I mentioned above the unending costs in explanations, and nuisance gripes from memory-integrity tools about the deliberate leaks. There are many kinds of ongoing costs here, and no _intended_ benefit anymore (it certainly wasn't my intent to cater to buggy C code forever). >> It was never legit to do #a without holding the GIL. It was clear as >> mud whether it was legit to do #b without holding the GIL. If >> PyMem_Del (etc) change to expand to "free" in a release build, then #b >> can remain clear as mud without harming anyone. Nobody should be >> doing #a anymore. If someone still is, "tough luck -- fix it, you've >> had years of warning" is easy for me to live with at this stage. > Hmm... The issue is that case #a may not be an easy problem to > diagnose: Many errors in C code are difficult to diagnose. That's life. Mixing a PyObject call with a PyMem call is obvious now "by eyeball", so if there is such code still out there, and it blows up, an experienced eye has a good chance of spotting the error at once. ' > Some implementations of free() will happily do nothing if > they are passed a pointer they know nothing about. This would just > result in a memory leak. Other implementations of free() can output a > warning or crash in this case, which would make it trivial to locate. I expect most implementations of free() would end up corrupting memory state, leading to no symptoms or to disastrous symptoms, from 0 to a googol cycles after the mistake was made. Errors in using malloc/free are often nightmares to debug. We're not trying to make coding in C pleasant here -- which is good, because that's unachievable . >> I suppose the other consideration is that already-compiled extension >> modules on non-Windows(*) systems will, if they're not recompiled, >> continue to call PyObject_Free everywhere they had a >> PyMem_Del/DEL/FREE call. > Is it guaranteed that extension modules will be binary compatible with > future Python releases? I didn't think this was the case. Nope, that's not guarantfeed. There's a magic number (PYTHON_API_VERSION) that changes whenever the Python C API undergoes an incompatible change, and binary compatibility is guaranteed across releases if that doesn't change. The then-current value of PYTHON_API_VERSION gets compiled into extensions, by virtue of the module-initialization macro their initialization function has to call. The guts of that function are in the Python core (Py_InitModule4()), which raises this warning if the passed-in version doesn't match the current version: "Python C API version mismatch for module %.100s:\ This Python has API version %d, module %.100s has version %d."; This is _just_ a warning, though. Perhaps unfortunately for Python's users, Guido learned long ago that most API mismatches don't actually matter for his own code . For example, the C API officially changed when the signature of PyFrame_New() changed in 2001 -- but almost no extension modules call that function. Similarly, if we change PyMem_Del (etc) to map to the system free(), PYTHON_API_VERSION should be bumped for Python 2.5 -- but many people will ignore the mismatch warning, and again it will probably make no difference (if there's code still out there that calls PyMem_DEL (etc) without holding the GIL, I don't know about it). From kbk at shore.net Wed Feb 16 06:32:36 2005 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Feb 16 06:32:48 2005 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200502160532.j1G5Wahi031058@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 298 open (+14) / 2754 closed ( +6) / 3052 total (+20) Bugs : 823 open (+19) / 4829 closed (+17) / 5652 total (+36) RFE : 168 open ( +1) / 144 closed ( +2) / 312 total ( +3) New / Reopened Patches ______________________ date.strptime and time.strptime as well (2005-02-04) http://python.org/sf/1116362 opened by Josh NameError in cookielib domain check (2005-02-04) CLOSED http://python.org/sf/1116583 opened by Chad Miller Minor improvement on 1116583 (2005-02-06) http://python.org/sf/1117114 opened by John J Lee cookielib and cookies with special names (2005-02-06) http://python.org/sf/1117339 opened by John J Lee cookielib LWPCookieJar and MozillaCookieJar exceptions (2005-02-06) http://python.org/sf/1117398 opened by John J Lee cookielib.LWPCookieJar incorrectly loads value-less cookies (2005-02-06) http://python.org/sf/1117454 opened by John J Lee urllib2 .getheaders attribute error (2005-02-07) http://python.org/sf/1117588 opened by Wummel replace md5 impl. with one having a more free license (2005-02-07) CLOSED http://python.org/sf/1117961 opened by Matthias Klose unknown locale: lt_LT (patch) (2005-02-08) http://python.org/sf/1118341 opened by Nerijus Baliunas Fix crash in xmlprase_GetInputContext in pyexpat.c (2005-02-08) http://python.org/sf/1118602 opened by Mathieu Fenniak enable time + timedelta (2005-02-08) http://python.org/sf/1118748 opened by Josh fix for a bug in Header.__unicode__() (2005-02-09) CLOSED http://python.org/sf/1119016 opened by Bj?rn Lindqvist python -c readlink()s and stat()s '-c' (2005-02-09) http://python.org/sf/1119423 opened by Brian Foley patches to compile for AIX 4.1.x (2005-02-09) http://python.org/sf/1119626 opened by Stuart D. Gathman better datetime support for xmlrpclib (2005-02-10) http://python.org/sf/1120353 opened by Fred L. Drake, Jr. ZipFile.open - read-only file-like obj for files in archive (2005-02-11) http://python.org/sf/1121142 opened by Alan McIntyre Reference count bug fix (2005-02-12) http://python.org/sf/1121234 opened by Michiel de Hoon sha and md5 modules should use OpenSSL when possible (2005-02-12) http://python.org/sf/1121611 opened by Gregory P. Smith Python memory allocator: Free memory (2005-02-15) http://python.org/sf/1123430 opened by Evan Jones Patches Closed ______________ Add SSL certificate validation (2005-02-03) http://python.org/sf/1115631 closed by noonian NameError in cookielib domain check (2005-02-04) http://python.org/sf/1116583 closed by rhettinger replace md5 impl. with one having a more free license (2005-02-07) http://python.org/sf/1117961 closed by loewis fix for a bug in Header.__unicode__() (2005-02-09) http://python.org/sf/1119016 closed by sonderblade time.tzset() not built on Solaris (2005-01-04) http://python.org/sf/1096244 closed by bcannon OSATerminology extension fix (2004-06-25) http://python.org/sf/979784 closed by jackjansen New / Reopened Bugs ___________________ xmlrpclib: wrong decoding in '_stringify' (2005-02-04) CLOSED http://python.org/sf/1115989 opened by Dieter Maurer Prefix search is filesystem-centric (2005-02-04) http://python.org/sf/1116520 opened by Steve Holden Wrong match with regex, non-greedy problem (2005-02-05) CLOSED http://python.org/sf/1116571 opened by rengel Solaris 10 fails to compile complexobject.c (2005-02-04) http://python.org/sf/1116722 opened by Case Van Horsen Dictionary Evaluation Issue (2005-02-05) http://python.org/sf/1117048 opened by WalterBrunswick Typo in list.sort() documentation (2005-02-06) CLOSED http://python.org/sf/1117063 opened by Viktor Ferenczi sgmllib.SGMLParser (2005-02-06) CLOSED http://python.org/sf/1117302 opened by Paul Birnie SimpleHTTPServer and mimetypes: almost together (2005-02-06) http://python.org/sf/1117556 opened by Matthew L Daniel os.path.exists returns false negatives in MAC environments. (2005-02-07) http://python.org/sf/1117601 opened by Stephen Bennett profiler: Bad return and Bad call errors with exceptions (2005-02-06) http://python.org/sf/1117670 opened by Matthew Mueller "in" operator bug ? (2005-02-07) CLOSED http://python.org/sf/1117757 opened by Andrea Bolzonella BSDDB openhash (2005-02-07) http://python.org/sf/1117761 opened by Andrea Bolzonella lists coupled (2005-02-07) CLOSED http://python.org/sf/1118101 opened by chopf Error in representation of complex numbers(again) (2005-02-09) http://python.org/sf/1118729 opened by George Yoshida builtin file() vanishes (2005-02-08) CLOSED http://python.org/sf/1118977 opened by Barry Alan Scott Docs for set() omit constructor (2005-02-09) CLOSED http://python.org/sf/1119282 opened by Kent Johnson curses.initscr - initscr exit w/o env(TERM) set (2005-02-09) http://python.org/sf/1119331 opened by Jacob Lilly xrange() builtin accepts keyword arg silently (2005-02-09) http://python.org/sf/1119418 opened by Martin Blais Python Programming FAQ should be updated for Python 2.4 (2005-02-09) http://python.org/sf/1119439 opened by Michael Hoffman ScrolledText allows Frame.bbox to hide Text.bbox (2005-02-09) http://python.org/sf/1119673 opened by Drew Perttula list extend() accepts args besides lists (2005-02-09) CLOSED http://python.org/sf/1119700 opened by Dan Everhart Static library incompatible with nptl (2005-02-10) http://python.org/sf/1119860 opened by daniel Static library incompatible with nptl (2005-02-10) CLOSED http://python.org/sf/1119866 opened by daniel Python 2.4.0 crashes with a segfault, EXAMPLE ATTACHED (2005-02-11) http://python.org/sf/1120452 opened by Viktor Ferenczi bug in unichr() documentation (2005-02-11) http://python.org/sf/1120777 opened by Marko Kreen Problem in join function definition (2005-02-11) CLOSED http://python.org/sf/1120862 opened by yseb file seek error (2005-02-11) CLOSED http://python.org/sf/1121152 opened by Richard Lawhorn Python24.dll crashes, EXAMPLE ATTACHED (2005-02-12) http://python.org/sf/1121201 opened by Viktor Ferenczi zip incorrectly and incompletely documented (2005-02-12) http://python.org/sf/1121416 opened by Alan Decorated functions are unpickleable (2005-02-12) CLOSED http://python.org/sf/1121475 opened by S Joshua Swamidass distutils.dir_utils not unicode compatible (2005-02-12) http://python.org/sf/1121494 opened by Morten Lied Johansen subprocess example missing "stdout=PIPE" (2005-02-12) http://python.org/sf/1121579 opened by Monte Davidoff SMTPHandler argument misdescribed (2005-02-13) http://python.org/sf/1121875 opened by Peter marshal may crash on truncated input (2005-02-14) http://python.org/sf/1122301 opened by Fredrik Lundh incorrect handle of declaration in markupbase (2005-02-14) http://python.org/sf/1122916 opened by Wai Yip Tung Typo in Curses-Function doc (2005-02-15) http://python.org/sf/1123268 opened by Aaron C. Spike test_peepholer failing on HEAD (2005-02-15) CLOSED http://python.org/sf/1123354 opened by Tim Peters add SHA256/384/512 to lib (2005-02-16) http://python.org/sf/1123660 opened by paul rubin Bugs Closed ___________ xmlrpclib: wrong decoding in '_stringify' (2005-02-04) http://python.org/sf/1115989 closed by fdrake Wrong match with regex, non-greedy problem (2005-02-05) http://python.org/sf/1116571 closed by effbot Typo in list.sort() documentation (2005-02-05) http://python.org/sf/1117063 closed by rhettinger sgmllib.SGMLParser (2005-02-06) http://python.org/sf/1117302 closed by effbot PyThreadState_SetAsyncExc segfault (2004-11-18) http://python.org/sf/1069160 closed by gvanrossum "in" operator bug ? (2005-02-07) http://python.org/sf/1117757 closed by tim_one lists coupled (2005-02-07) http://python.org/sf/1118101 closed by tim_one builtin file() vanishes (2005-02-09) http://python.org/sf/1118977 closed by loewis Docs for set() omit constructor (2005-02-09) http://python.org/sf/1119282 closed by rhettinger list extend() accepts args besides lists (2005-02-09) http://python.org/sf/1119700 closed by rhettinger Static library incompatible with nptl (2005-02-10) http://python.org/sf/1119866 closed by ekloef Problem in join function definition (2005-02-11) http://python.org/sf/1120862 closed by rhettinger file seek error (2005-02-11) http://python.org/sf/1121152 closed by tim_one Decorated functions are unpickleable (2005-02-12) http://python.org/sf/1121475 closed by bcannon "Macintosh" references in the docs need to be checked. (2005-01-04) http://python.org/sf/1095802 closed by bcannon RE '*.?' cores if len of found string exceeds 10000 (2004-10-26) http://python.org/sf/1054564 closed by effbot missing mappings in locale tables (2002-10-09) http://python.org/sf/620739 closed by effbot test_peepholer failing on HEAD (2005-02-15) http://python.org/sf/1123354 closed by tim_one New / Reopened RFE __________________ urllib.urlopen should put the http-error-code in .info() (2005-02-07) http://python.org/sf/1117751 opened by Robert Kiendl Option to force variables to be declared (2005-02-14) http://python.org/sf/1122279 opened by Zac Evans Line Numbers (2005-02-14) http://python.org/sf/1122532 opened by Egon Frerich RFE Closed __________ commands.mkarg function should be public (2001-12-04) http://python.org/sf/489106 closed by donut Missing socketpair() function. (2002-06-12) http://python.org/sf/567969 closed by grahamh From martin at v.loewis.de Wed Feb 16 08:50:51 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Feb 16 08:50:55 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <1f7befae05021520263d77a2a3@mail.gmail.com> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> <1f7befae05021520263d77a2a3@mail.gmail.com> Message-ID: <4212FB5B.1030209@v.loewis.de> Tim Peters wrote: > I'm not certain it is acceptable to make this assumption. Why is it > not possible to use the same approach that was previously used (i.e. > leak the arenas array)? > > Do you have something else in mind? I'll talk with Martin about it if > he still wants to. Martin, this miserable code must die! That's fine with me. I meant what I said: "I'm not certain". The patch original claimed that it cannot possibly preserve this feature, and I felt that this claim was incorrect - indeed, Evan then understood the feature, and made it possible. I can personally accept breaking the code that still relies on the invalid APIs. The only problem is that it is really hard to determine whether some code *does* violate the API usage. Regards, Martin From konrad.hinsen at laposte.net Thu Feb 10 09:38:40 2005 From: konrad.hinsen at laposte.net (konrad.hinsen@laposte.net) Date: Wed Feb 16 14:20:17 2005 Subject: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it In-Reply-To: References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: On 10.02.2005, at 05:36, Guido van Rossum wrote: > And why would a Matrix need to inherit from a C-array? Wouldn't it > make more sense from an OO POV for the Matrix to *have* a C-array > without *being* one? Definitely. Most array operations make no sense on matrices. And matrices are limited to two dimensions. Making Matrix a subclass of Array would be inheritance for implementation while removing 90% of the interface. On the other hand, a Matrix object is perfectly defined by its behaviour and independent of its implementation. One could perfectly well implement one using Python lists or dictionaries, even though that would be pointless from a performance point of view. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Thu Feb 10 09:45:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen@laposte.net) Date: Wed Feb 16 14:20:18 2005 Subject: [Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it In-Reply-To: <420ADE90.9050304@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: <1c3044466186480f55ef45d2c977731b@laposte.net> On 10.02.2005, at 05:09, Travis Oliphant wrote: > I'm not sure I agree. The ufuncobject is the only place where this > concern existed (should we trip OverFlow, ZeroDivision, etc. errors > durring array math). Numarray introduced and implemented the concept > of error modes that can be pushed and popped. I believe this is the > right solution for the ufuncobject. Indeed. Note also that the ufunc stuff is less critical to agree on than the array data structure. Anyone unhappy with ufuncs could write their own module and use it instead. It would be the data structure and its access rules that fix the structure of all the code that uses it, so that's what needs to be acceptable to everyone. > One question we are pursuing is could the arrayobject get into the > core without a particular ufunc object. Most see this as > sub-optimal, but maybe it is the only way. Since all the artithmetic operations are in ufunc that would be suboptimal solution, but indeed still a workable one. > I appreciate some of what Paul is saying here, but I'm not fully > convinced that this is still true with Python 2.2 and up new-style > c-types. The concerns seem to be over the fact that you have to > re-implement everything in the sub-class because the base-class will > always return one of its objects instead of a sub-class object. I'd say that such discussions should be postponed until someone proposes a good use for subclassing arrays. Matrices are not one, in my opinion. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ ------- From verveer at embl.de Thu Feb 10 10:53:10 2005 From: verveer at embl.de (Peter Verveer) Date: Wed Feb 16 14:20:20 2005 Subject: [Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it In-Reply-To: <420B29AE.8030701@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> <1c3044466186480f55ef45d2c977731b@laposte.net> <420B29AE.8030701@ee.byu.edu> Message-ID: <50ac60a36c2add7d708dc02d8bf623a3@embl.de> On Feb 10, 2005, at 10:30 AM, Travis Oliphant wrote: > >>> One question we are pursuing is could the arrayobject get into the >>> core without a particular ufunc object. Most see this as >>> sub-optimal, but maybe it is the only way. >> >> >> Since all the artithmetic operations are in ufunc that would be >> suboptimal solution, but indeed still a workable one. > > > I think replacing basic number operations of the arrayobject should > simple, so perhaps a default ufunc object could be worked out for > inclusion. I agree, getting it in the core is among others, intended to give it broad access, not just to hard-core numeric people. For many uses (including many of my simpler scripts) you don't need the more exotic functionality of ufuncs. You could just do with implementing the standard math functions, possibly leaving out things like reduce. That would be very easy to implement. > >> >>> I appreciate some of what Paul is saying here, but I'm not fully >>> convinced that this is still true with Python 2.2 and up new-style >>> c-types. The concerns seem to be over the fact that you have to >>> re-implement everything in the sub-class because the base-class will >>> always return one of its objects instead of a sub-class object. >> >> >> I'd say that such discussions should be postponed until someone >> proposes a good use for subclassing arrays. Matrices are not one, in >> my opinion. >> > Agreed. It is is not critical to what I am doing, and I obviously > need more understanding before tackling such things. Numeric3 uses > the new c-type largely because of the nice getsets table which is > separate from the methods table. This replaces the rather ugly > C-functions getattr and setattr. I would agree that sub-classing arrays might not be worth the trouble. Peter From perry at stsci.edu Thu Feb 10 16:21:24 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Feb 16 14:20:21 2005 Subject: [Python-Dev] RE: [Numpy-discussion] Numeric life as I see it In-Reply-To: <420AB928.3090004@pfdubois.com> Message-ID: Paul Dubois wrote: > > Aside: While I am at it, let me reiterate what I have said to the other > developers privately: there is NO value to inheriting from the array > class. Don't try to achieve that capability if it costs anything, even > just effort, because it buys you nothing. Those of you who keep > remarking on this as if it would simply haven't thought it through IMHO. > It sounds so intellectually appealing that David Ascher and I had a > version of Numeric that almost did it before we realized our folly. > To be contrarian, we did find great benefit (at least initially) for inheritance for developing the record array and character array classes since they share so many structural operations (indexing, slicing, transposes, concatenation, etc.) with numeric arrays. It's possible that the approach that Travis is considering doesn't need to use inheritance to accomplish this (I don't know enough about the details yet), but it sure did save a lot of duplication of implementation. I do understand what you are getting at. Any numerical array inheritance generally forces one to reimplement all ufuncs and such, and that does make it less useful in that case (though I still wonder if it still isn't better than the alternatives) Perry Greenfield From nick at ilm.com Fri Feb 11 23:32:15 2005 From: nick at ilm.com (Nick Rasmussen) Date: Wed Feb 16 14:20:22 2005 Subject: [Python-Dev] subclassing PyCFunction_Type Message-ID: <20050211223215.GS14902@ewok.lucasdigital.com> tommy said that this would be the best place to ask this question.... I'm trying to get functions wrapped via boost to show up as builtin types so that pydoc includes them when documenting the module containing them. Right now boost python functions are created using a PyTypeObject such that when inspect.isbuiltin does: return isinstance(object, types.BuiltinFunctionType) isintance returns 0. Initially I had just modified a local pydoc to document all functions with unknown source modules (since the module can't be deduced from non-python functions), but I figured that the right fix was to get boost::python functions to correctly show up as builtins, so I tried setting PyCFunction_Type as the boost function type object's tp_base, which worked fine for me using linux on amd64, but when my patch was tried out on other platforms, it ran into regression test failures: http://mail.python.org/pipermail/c++-sig/2005-February/008545.html So I have some questions: Should boost::python functions be modified in some way to show up as builtin function types or is the right fix really to patch pydoc? Is PyCFunction_Type intended to be subclassable? I noticed that it does not have Py_TPFLAGS_BASETYPE set in its tp_flags. Also, PyCFunction_Type has Py_TPFLAGS_HAVE_GC, and as the assertion failures in the testsuite seemed to be centered around object allocation/ garbage collection, so is there something related to subclassing a gc-aware class that needs to be happening (currently the boost type object doesn't support garbage collection). If subclassing PyCFunction_Type isn't the right way to make these functions be considered as builtin functions, what is? -nick From apolinejuliet at yahoo.com Mon Feb 14 04:31:40 2005 From: apolinejuliet at yahoo.com (apoline juliet obina) Date: Wed Feb 16 14:20:24 2005 Subject: [Python-Dev] Py2.3.1 Message-ID: <20050214033140.60072.qmail@web30707.mail.mud.yahoo.com> iis it "pydos" ? your net add?/ --------------------------------- Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050214/ca6c95d6/attachment.htm From Martin.Gfeller at comit.ch Mon Feb 14 19:41:51 2005 From: Martin.Gfeller at comit.ch (Gfeller Martin) Date: Wed Feb 16 14:20:25 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% Message-ID: Dear all, I'm running a large Zope application on a 1x1GHz CPU 1GB mem Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 The application typically builds large lists by appending and extending them. We regularly observed that using a given functionality a second time using the same process was much slower (50%) than when it ran the first time after startup. This behavior greatly improved with Python 2.3 (thanks to the improved Python object allocator, I presume). Nevertheless, I tried to convert the heap used by Python to a Windows Low Fragmentation Heap (available on XP and 2003 Server). This improved the overall run time of a typical CPU-intensive report by about 15% (overall run time is in the 5 minutes range), with the same memory consumption. I consider 15% significant enough to let you know about it. For information about the Low Fragmentation Heap, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/low_fragmentation_heap.asp Best regards, Martin PS: Since I don't speak C, I used ctypes to convert all heaps in the process to LFH (I don't know how to determine which one is the C heap). ________________________ COMIT AG Risk Management Systems Pflanzschulstrasse 7 CH-8004 Z?rich Telefon +41 (44) 1 298 92 84 http://www.comit.ch http://www.quantax.com - Quantax Trading and Risk System From leogah at spamcop.net Mon Feb 14 23:35:31 2005 From: leogah at spamcop.net (Richard Brodie) Date: Wed Feb 16 14:20:26 2005 Subject: [Python-Dev] builtin_id() returns negative numbers Message-ID: <000701c512e5$7de81660$af0189c3@oemcomputer> > Maybe it's just a wart we have to live with now; OTOH, > the docs explicitly warn that id() may return a long, so any code > relying on "short int"-ness has always been relying on an > implementation quirk. Well, the docs say that %x does unsigned conversion, so they've been relying on an implementation quirk as well ;) Would it be practical to add new conversion syntax to string interpolation? Like, for example, %p as an unsigned hex number the same size as (void *). Otherwise, unless I misunderstand integer unification, one would just have to strike the distinction between, say, %d and %u. From mwh at python.net Wed Feb 16 14:33:28 2005 From: mwh at python.net (Michael Hudson) Date: Wed Feb 16 14:33:31 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <20050211223215.GS14902@ewok.lucasdigital.com> (Nick Rasmussen's message of "Fri, 11 Feb 2005 14:32:15 -0800") References: <20050211223215.GS14902@ewok.lucasdigital.com> Message-ID: <2m4qgc1vfb.fsf@starship.python.net> Nick Rasmussen writes: [five days ago] > Should boost::python functions be modified in some way to show > up as builtin function types or is the right fix really to patch > pydoc? My heart leans towards the latter. > Is PyCFunction_Type intended to be subclassable? Doesn't look like it, does it? :) More seriosly, "no". Cheers, mwh -- ARTHUR: Don't ask me how it works or I'll start to whimper. -- The Hitch-Hikers Guide to the Galaxy, Episode 11 From pje at telecommunity.com Wed Feb 16 17:02:18 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Feb 16 17:00:32 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <20050211223215.GS14902@ewok.lucasdigital.com> Message-ID: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: >tommy said that this would be the best place to ask >this question.... > >I'm trying to get functions wrapped via boost to show >up as builtin types so that pydoc includes them when >documenting the module containing them. Right now >boost python functions are created using a PyTypeObject >such that when inspect.isbuiltin does: > > return isinstance(object, types.BuiltinFunctionType) FYI, this may not be the "right" way to do this, but since 2.3 'isinstance()' looks at an object's __class__ rather than its type(), so you could perhaps include a '__class__' descriptor in your method type that returns BuiltinFunctionType and see if that works. It's a kludge, but it might let your code work with existing versions of Python. From bob at redivi.com Wed Feb 16 17:26:34 2005 From: bob at redivi.com (Bob Ippolito) Date: Wed Feb 16 17:26:43 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> Message-ID: <5614e00fb134b968fa76a1896c456f4a@redivi.com> On Feb 16, 2005, at 11:02, Phillip J. Eby wrote: > At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: >> tommy said that this would be the best place to ask >> this question.... >> >> I'm trying to get functions wrapped via boost to show >> up as builtin types so that pydoc includes them when >> documenting the module containing them. Right now >> boost python functions are created using a PyTypeObject >> such that when inspect.isbuiltin does: >> >> return isinstance(object, types.BuiltinFunctionType) > > FYI, this may not be the "right" way to do this, but since 2.3 > 'isinstance()' looks at an object's __class__ rather than its type(), > so you could perhaps include a '__class__' descriptor in your method > type that returns BuiltinFunctionType and see if that works. > > It's a kludge, but it might let your code work with existing versions > of Python. It works in Python 2.3.0: import types class FakeBuiltin(object): __doc__ = property(lambda self: self.doc) __name__ = property(lambda self: self.name) __self__ = property(lambda self: None) __class__ = property(lambda self: types.BuiltinFunctionType) def __init__(self, name, doc): self.name = name self.doc = doc >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) Help on built-in function name: name(...) name(foo, bar, baz) -> rval -bob From pje at telecommunity.com Wed Feb 16 17:43:51 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Feb 16 17:42:04 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5614e00fb134b968fa76a1896c456f4a@redivi.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote: > >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) >Help on built-in function name: > >name(...) > name(foo, bar, baz) -> rval If you wanted to be even more ambitious, you could return FunctionType and have a fake func_code so pydoc will be able to see the argument signature directly. :) From bob at redivi.com Wed Feb 16 17:52:56 2005 From: bob at redivi.com (Bob Ippolito) Date: Wed Feb 16 17:53:11 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> Message-ID: <640f0846671b73a92939648d278e4861@redivi.com> On Feb 16, 2005, at 11:43, Phillip J. Eby wrote: > At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote: >> >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) >> Help on built-in function name: >> >> name(...) >> name(foo, bar, baz) -> rval > > If you wanted to be even more ambitious, you could return FunctionType > and have a fake func_code so pydoc will be able to see the argument > signature directly. :) I was thinking that too, but I didn't have the energy to code it in an email :) -bob From fredrik at pythonware.com Wed Feb 16 21:08:14 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 21:19:07 2005 Subject: [Python-Dev] string find(substring) vs. substring in string Message-ID: any special reason why "in" is faster if the substring is found, but a lot slower if it's not in there? timeit -s "s = 'not there'*100" "s.find('not there') != -1" 1000000 loops, best of 3: 0.749 usec per loop timeit -s "s = 'not there'*100" "'not there' in s" 10000000 loops, best of 3: 0.122 usec per loop timeit -s "s = 'not the xyz'*100" "s.find('not there') != -1" 100000 loops, best of 3: 7.03 usec per loop timeit -s "s = 'not the xyz'*100" "'not there' in s" 10000 loops, best of 3: 25.9 usec per loop ps. btw, it's about time we did something about this: timeit -s "s = 'not the xyz'*100" -s "import re; p = re.compile('not there')" "p.search(s)" 100000 loops, best of 3: 5.72 usec per loop From FBatista at uniFON.com.ar Wed Feb 16 21:23:59 2005 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Wed Feb 16 21:28:28 2005 Subject: [Python-Dev] string find(substring) vs. substring in string Message-ID: [Fredrik Lundh] #- any special reason why "in" is faster if the substring is found, but #- a lot slower if it's not in there? Maybe because it stops searching when it finds it? The time seems to be very dependant of the position of the first match: fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'not there'*100" "'not there' in s" 1000000 loops, best of 3: 0.222 usec per loop fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*20 + 'not there'*100" "'not there' in s" 100000 loops, best of 3: 5.54 usec per loop fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*40 + 'not there'*100" "'not there' in s" 100000 loops, best of 3: 10.8 usec per loop . Facundo Bit?cora De Vuelo: http://www.taniquetil.com.ar/plog PyAr - Python Argentina: http://pyar.decode.com.ar/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050216/e799aff5/attachment.html From mike at skew.org Wed Feb 16 21:34:16 2005 From: mike at skew.org (Mike Brown) Date: Wed Feb 16 21:34:18 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: Message-ID: <200502162034.j1GKYGBU067236@chilled.skew.org> Fredrik Lundh wrote: > any special reason why "in" is faster if the substring is found, but > a lot slower if it's not in there? Just guessing here, but in general I would think that it would stop searching as soon as it found it, whereas until then, it keeps looking, which takes more time. But I would also hope that it would be smart enough to know that it doesn't need to look past the 2nd character in 'not the xyz' when it is searching for 'not there' (due to the lengths of the sequences). From amk at amk.ca Wed Feb 16 21:54:31 2005 From: amk at amk.ca (A.M. Kuchling) Date: Wed Feb 16 21:57:23 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <200502162034.j1GKYGBU067236@chilled.skew.org> References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: <20050216205431.GA8873@rogue.amk.ca> On Wed, Feb 16, 2005 at 01:34:16PM -0700, Mike Brown wrote: > time. But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). Assuming stringobject.c:string_contains is the right function, the code looks like this: size = PyString_GET_SIZE(el); rhs = PyString_AS_STRING(el); lhs = PyString_AS_STRING(a); /* optimize for a single character */ if (size == 1) return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; end = lhs + (PyString_GET_SIZE(a) - size); while (lhs <= end) { if (memcmp(lhs++, rhs, size) == 0) return 1; } So it's doing a zillion memcmp()s. I don't think there's a more efficient way to do this with ANSI C; memmem() is a GNU extension that searches for blocks of memory. Perhaps saving some memcmps by writing if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) would help. --amk From gvanrossum at gmail.com Wed Feb 16 22:03:10 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Feb 16 22:03:13 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <20050216205431.GA8873@rogue.amk.ca> References: <200502162034.j1GKYGBU067236@chilled.skew.org> <20050216205431.GA8873@rogue.amk.ca> Message-ID: > Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory. Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help. Which is exactly how s.find() wins this race. (I guess it loses when it's found by having to do the "find" lookup.) Maybe string_contains should just call string_find_internal()? And then there's the question of how the re module gets to be faster still; I suppose it doesn't bother with memcmp() at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From irmen at xs4all.nl Wed Feb 16 22:08:36 2005 From: irmen at xs4all.nl (Irmen de Jong) Date: Wed Feb 16 22:08:38 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <200502162034.j1GKYGBU067236@chilled.skew.org> References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: <4213B654.7070901@xs4all.nl> Mike Brown wrote: > Fredrik Lundh wrote: > >>any special reason why "in" is faster if the substring is found, but >>a lot slower if it's not in there? > > > Just guessing here, but in general I would think that it would stop searching > as soon as it found it, whereas until then, it keeps looking, which takes more > time. But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). There's the Boyer-Moore string search algorithm which is allegedly much faster than a simplistic scanning approach, and I also found this: http://portal.acm.org/citation.cfm?id=79184 So perhaps there's room for improvement :) --Irmen From fredrik at pythonware.com Wed Feb 16 22:19:20 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:19:13 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> <20050216205431.GA8873@rogue.amk.ca> Message-ID: A.M. Kuchling wrote: >> time. But I would also hope that it would be smart enough to know that it >> doesn't need to look past the 2nd character in 'not the xyz' when it is >> searching for 'not there' (due to the lengths of the sequences). > > Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory. oops. so whoever implemented contains didn't even bother to look at the find implementation... (which uses the same brute-force algorithm, but a better implementation...) > Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help. memcmp still compiles to REP CMPB on many x86 compilers, and the setup overhead for memcmp sucks on modern x86 hardware; it's usually better to write your own bytewise comparision... (and the fact that we're still brute-force search algorithms in "find" is a bit embarrassing -- note that RE outperforms "in" by a factor of five.... guess it's time to finish the split/replace parts of stringlib and produce a patch... ;-) From fredrik at pythonware.com Wed Feb 16 22:23:03 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:33:56 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Mike Brown wrote: >> any special reason why "in" is faster if the substring is found, but >> a lot slower if it's not in there? > > Just guessing here, but in general I would think that it would stop searching > as soon as it found it, whereas until then, it keeps looking, which takes more > time. the point was that string.find does the same thing, but is much faster in the "no match" case. > But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). note that the target string was "not the xyz"*100, so the search algorithm surely has to look past the second character ;-) (btw, the benchmark was taken from jim hugunin's ironpython talk, and seems to be carefully designed to kill performance also for more advanced algorithms -- including boyer-moore) From fredrik at pythonware.com Wed Feb 16 22:50:55 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:50:53 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org><20050216205431.GA8873@rogue.amk.ca> Message-ID: Guido van Rossum wrote: > Which is exactly how s.find() wins this race. (I guess it loses when > it's found by having to do the "find" lookup.) Maybe string_contains > should just call string_find_internal()? I somehow suspected that "in" did some extra work in case the "find" failed; guess I should have looked at the code instead... I didn't really expect anyone to use a bad implementation of a brute-force algorithm (O(nm)) when the library already contained a reasonably good version of the same algorithm. > And then there's the question of how the re module gets to be faster > still; I suppose it doesn't bother with memcmp() at all. the benchmark cheats (a bit) -- it builds a state machine (KMP-style) in "compile", and uses that to search in O(n) time. that approach won't fly for "in" and find, of course, but it's definitely possible to make them run a lot faster than RE (i.e. O(n/m) for most cases)... but refactoring the contains code to use find_internal sounds like a good first step. any takers? From tim.peters at gmail.com Wed Feb 16 22:55:27 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 16 22:55:49 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage Message-ID: <1f7befae05021613553afaaa2f@mail.gmail.com> Rev 2.66 of funcobject.c made func.__name__ writable for the first time. That's great, but the patch also introduced what I'm pretty sure was an unintended incompatibility: after 2.66, func.__name__ was no longer *readable* in restricted execution mode. I can't think of a good reason to restrict reading func.__name__, and it looks like this part of the change was an accident. So, unless someone objects soon, I intend to restore that func.__name__ is readable regardless of execution mode (but will continue to be unwritable in restricted execution mode). Objections? Tres Seaver filed a bug report (some Zope tests fail under 2.4 because of this): http://www.python.org/sf/1124295 From raymond.hettinger at verizon.net Wed Feb 16 23:06:54 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Feb 16 23:11:46 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string Message-ID: <000001c51473$df4717a0$8d2acb97@oemcomputer> > but refactoring the contains code to use find_internal sounds like a good > first step.? any takers? > > ? I'm up for it. ? Raymond Hettinger From fredrik at pythonware.com Wed Feb 16 23:10:40 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 23:11:52 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org><20050216205431.GA8873@rogue.amk.ca> Message-ID: > memcmp still compiles to REP CMPB on many x86 compilers, and the setup > overhead for memcmp sucks on modern x86 hardware make that "compiles to REPE CMPSB" and "the setup overhead for REPE CMPSB" From Scott.Daniels at Acm.Org Wed Feb 16 23:00:54 2005 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Wed Feb 16 23:12:18 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: <4213B654.7070901@xs4all.nl> References: <200502162034.j1GKYGBU067236@chilled.skew.org> <4213B654.7070901@xs4all.nl> Message-ID: Irmen de Jong wrote: > There's the Boyer-Moore string search algorithm which is > allegedly much faster than a simplistic scanning approach, > and I also found this: http://portal.acm.org/citation.cfm?id=79184 > So perhaps there's room for improvement :) The problem is setup vs. run. If the question is 'ab in 'rabcd', Boyer-Moore and other fancy searches will be swamped with prep time. In Fred's comparison with re, he does the re.compile(...) outside of the timing loop. You need to decide what the common case is. The longer the thing you are searching in, the more one-time-only overhead you can afford to reduce the per-search-character cost. --Scott David Daniels Scott.Daniels@Acm.Org From gvanrossum at gmail.com Wed Feb 16 23:16:08 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Feb 16 23:16:11 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: References: <200502162034.j1GKYGBU067236@chilled.skew.org> <4213B654.7070901@xs4all.nl> Message-ID: > The longer the thing you are searching in, the more one-time-only > overhead you can afford to reduce the per-search-character cost. Only if you don't find it close to the start. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From Scott.Daniels at Acm.Org Wed Feb 16 23:19:20 2005 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Wed Feb 16 23:33:23 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Fredrik Lundh wrote: > (btw, the benchmark was taken from jim hugunin's ironpython talk, and > seems to be carefully designed to kill performance also for more advanced > algorithms -- including boyer-moore) Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do about 300 probes once the table is set (the underscores below): not the xyznot the xyznot the xyz... not ther_ not the__ not ther_ not the__ not ther_ ... -- Scott David Daniels Scott.Daniels@Acm.Org From fredrik at pythonware.com Thu Feb 17 00:10:29 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 00:16:13 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Scott David Daniels wrote: > Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do > about 300 probes once the table is set (the underscores below): > > not the xyznot the xyznot the xyz... > not ther_ > not the__ > not ther_ > not the__ > not ther_ > ... yup; it gets into a 9/2/9/2 rut. tweak the pattern a little, and you get better results for BM. ("kill" is of course an understatement, but BM usually works better. but it still needs a sizeof(alphabet) table, so you can pretty much forget about it if you want to support unicode...) From martin at v.loewis.de Thu Feb 17 00:42:05 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 17 00:42:09 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <4213DA4D.8090502@v.loewis.de> Gfeller Martin wrote: > Nevertheless, I tried to convert the heap used by Python > to a Windows Low Fragmentation Heap (available on XP > and 2003 Server). This improved the overall run time > of a typical CPU-intensive report by about 15% > (overall run time is in the 5 minutes range), with the > same memory consumption. I must admit that I'm surprised. I would have expected that most allocations in Python go through obmalloc, so the heap would only see "large" allocations. It would be interesting to find out, in your application, why it is still an improvement to use the low-fragmentation heaps. Regards, Martin From allison at sumeru.stanford.EDU Thu Feb 17 01:06:24 2005 From: allison at sumeru.stanford.EDU (Dennis Allison) Date: Thu Feb 17 01:06:31 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <4213B654.7070901@xs4all.nl> Message-ID: Boyer-Moore and variants need a bit of preprocessing on the pattern which makes them great for long patterns but more costly for short ones. On Wed, 16 Feb 2005, Irmen de Jong wrote: > Mike Brown wrote: > > Fredrik Lundh wrote: > > > >>any special reason why "in" is faster if the substring is found, but > >>a lot slower if it's not in there? > > > > > > Just guessing here, but in general I would think that it would stop searching > > as soon as it found it, whereas until then, it keeps looking, which takes more > > time. But I would also hope that it would be smart enough to know that it > > doesn't need to look past the 2nd character in 'not the xyz' when it is > > searching for 'not there' (due to the lengths of the sequences). > > There's the Boyer-Moore string search algorithm which is > allegedly much faster than a simplistic scanning approach, > and I also found this: http://portal.acm.org/citation.cfm?id=79184 > So perhaps there's room for improvement :) > > --Irmen > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu > From ejones at uwaterloo.ca Thu Feb 17 02:26:16 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Thu Feb 17 02:26:22 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <4213DA4D.8090502@v.loewis.de> References: <4213DA4D.8090502@v.loewis.de> Message-ID: On Feb 16, 2005, at 18:42, Martin v. L?wis wrote: > I must admit that I'm surprised. I would have expected > that most allocations in Python go through obmalloc, so > the heap would only see "large" allocations. > > It would be interesting to find out, in your application, > why it is still an improvement to use the low-fragmentation > heaps. Hmm... This is an excellent point. A grep through the Python source code shows that the following files call the native system malloc (I've excluded a few obviously platform specific files). A quick visual inspection shows that most of these are using it to allocate some sort of array or string, so it likely *should* go through the system malloc. Gfeller, any idea if you are using any of the modules on this list? If so, it would be pretty easy to try converting them to call the obmalloc functions instead, and see how that affects the performance. Evan Jones Demo/pysvr/pysvr.c Modules/_bsddb.c Modules/_curses_panel.c Modules/_cursesmodule.c Modules/_hotshot.c Modules/_sre.c Modules/audioop.c Modules/bsddbmodule.c Modules/cPickle.c Modules/cStringIO.c Modules/getaddrinfo.c Modules/main.c Modules/pyexpat.c Modules/readline.c Modules/regexpr.c Modules/rgbimgmodule.c Modules/svmodule.c Modules/timemodule.c Modules/zlibmodule.c PC/getpathp.c Python/strdup.c Python/thread.c From greg.ewing at canterbury.ac.nz Thu Feb 17 03:27:09 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu Feb 17 03:27:24 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <000701c512e5$7de81660$af0189c3@oemcomputer> References: <000701c512e5$7de81660$af0189c3@oemcomputer> Message-ID: <421400FD.8090303@canterbury.ac.nz> Richard Brodie wrote: > > Otherwise, unless I misunderstand integer unification, one would > just have to strike the distinction between, say, %d and %u. Couldn't that be done anyway? The distinction really only makes sense in C, where there's no way of knowing whether the value is signed or unsigned otherwise. In Python the value itself knows whether it's signed or not. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From gvanrossum at gmail.com Thu Feb 17 07:22:40 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 17 07:22:43 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <421400FD.8090303@canterbury.ac.nz> References: <000701c512e5$7de81660$af0189c3@oemcomputer> <421400FD.8090303@canterbury.ac.nz> Message-ID: > > Otherwise, unless I misunderstand integer unification, one would > > just have to strike the distinction between, say, %d and %u. > > Couldn't that be done anyway? The distinction really only > makes sense in C, where there's no way of knowing whether > the value is signed or unsigned otherwise. In Python the > value itself knows whether it's signed or not. The time machine is at your service: in Python 2.4 there's no difference. That's integer unification for you! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at electricrain.com Thu Feb 17 07:53:30 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Thu Feb 17 07:53:51 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108340374.3768.33.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> Message-ID: <20050217065330.GP25441@zot.electricrain.com> fyi - i've updated the python sha1/md5 openssl patch. it now replaces the entire sha and md5 modules with a generic hashes module that gives access to all of the hash algorithms supported by OpenSSL (including appropriate legacy interface wrappers and falling back to the old code when compiled without openssl). https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 I don't quite like the module name 'hashes' that i chose for the generic interface (too close to the builtin hash() function). Other suggestions on a module name? 'digest' comes to mind. -greg From fredrik at pythonware.com Thu Feb 17 10:12:19 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 10:12:16 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c References: <1108090248.3753.53.camel@schizo><226e9c65e562f9b0439333053036fef3@redivi.com><1108102539.3753.87.camel@schizo><20050211175118.GC25441@zot.electricrain.com><00c701c5108e$f3d0b930$24ed0ccb@apana.org.au><5d300838ef9716aeaae53579ab1f7733@redivi.com><013501c510ae$2abd7360$24ed0ccb@apana.org.au><20050212133721.GA13429@rogue.amk.ca><20050212210402.GE25441@zot.electricrain.com><1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: "Gregory P. Smith" wrote: > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. hashtools, hashlib, and _hash are common names for helper modules like this. (you still provide md5 and sha wrappers, I hope) From mwh at python.net Thu Feb 17 11:51:35 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 11:51:37 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae05021613553afaaa2f@mail.gmail.com> (Tim Peters's message of "Wed, 16 Feb 2005 16:55:27 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> Message-ID: <2mzmy3zcg8.fsf@starship.python.net> Tim Peters writes: > Rev 2.66 of funcobject.c made func.__name__ writable for the first > time. That's great, but the patch also introduced what I'm pretty > sure was an unintended incompatibility: after 2.66, func.__name__ was > no longer *readable* in restricted execution mode. Yeah, my bad. > I can't think of a good reason to restrict reading func.__name__, > and it looks like this part of the change was an accident. So, > unless someone objects soon, I intend to restore that func.__name__ > is readable regardless of execution mode (but will continue to be > unwritable in restricted execution mode). > > Objections? Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Cheers, mwh -- The only problem with Microsoft is they just have no taste. -- Steve Jobs, (From _Triumph of the Nerds_ PBS special) and quoted by Aahz on comp.lang.python From astrand at lysator.liu.se Thu Feb 17 13:22:03 2005 From: astrand at lysator.liu.se (Peter Astrand) Date: Thu Feb 17 13:22:14 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) Message-ID: I'd like to have your opinion on this bug. Personally, I'd prefer to keep test_no_leaking as it is, but if you think otherwise... One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. Regards, ?strand ---------- Forwarded message ---------- Date: Thu, 17 Feb 2005 04:09:33 -0800 From: SourceForge.net To: noreply@sourceforge.net Subject: [ python-Bugs-1124637 ] test_subprocess is far too slow Bugs item #1124637, was opened at 2005-02-17 11:10 Message generated for change (Comment added) made by mwh You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1124637&group_id=5470 Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Peter ?strand (astrand) Summary: test_subprocess is far too slow Initial Comment: test_subprocess takes multiple minutes. I'm pretty sure it's "test_no_leaking". It should either be sped up or only tested when some -u argument is passed to regrtest. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2005-02-17 12:09 Message: Logged In: YES user_id=6656 Bog standard linux pc -- p3 933, 384 megs of ram. "$ time ./python ../Lib/test/regrtest.py test_subprocess" reports 2 minutes 7. This is a debug build, a release build might be quicker. A run of the entire test suite takes a hair over nine minutes, so 20-odd % of the time seems to be test_subprocess. It also takes ages on my old-ish ibook (600 Mhz G3, also 384 megs of ram), but that's at home and I can't time it. ---------------------------------------------------------------------- Comment By: Peter ?strand (astrand) Date: 2005-02-17 11:50 Message: Logged In: YES user_id=344921 Tell me a bit about your type of OS and hardware. On my machine (P4 2.66 GHz with Linux), the test takes 28 seconds. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1124637&group_id=5470 From ncoghlan at iinet.net.au Thu Feb 17 15:15:46 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Thu Feb 17 15:15:50 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214A712.6090107@iinet.net.au> Peter Astrand wrote: > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > test_no_leaking as it is, but if you think otherwise... > > One thing that actually can motivate that test_subprocess takes 20% of the > overall time is that this test is a good generic Python stress test - this > test might catch some other startup race condition, for example. test_decimal has a short version which tests basic functionality and always runs, but enabling -udecimal also runs the specification tests (which take a fair bit longer). So keeping the basic subprocess tests unconditional, and running the long ones only if -uall or -usubprocess are given would seem reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From fredrik at pythonware.com Thu Feb 17 15:19:24 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 15:19:58 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) References: <4214A712.6090107@iinet.net.au> Message-ID: Nick Coghlan wrote: >> One thing that actually can motivate that test_subprocess takes 20% of the >> overall time is that this test is a good generic Python stress test - this >> test might catch some other startup race condition, for example. > > test_decimal has a short version which tests basic functionality and always runs, but > enabling -udecimal also runs the specification tests (which take a fair bit longer). > > So keeping the basic subprocess tests unconditional, and running the long ones only if -uall > or -usubprocess are given would seem reasonable. does anyone ever use the -u options when running tests? From mwh at python.net Thu Feb 17 15:30:06 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 15:30:41 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: (Fredrik Lundh's message of "Thu, 17 Feb 2005 15:19:24 +0100") References: <4214A712.6090107@iinet.net.au> Message-ID: <2mll9nz2c1.fsf@starship.python.net> "Fredrik Lundh" writes: > Nick Coghlan wrote: > >>> One thing that actually can motivate that test_subprocess takes 20% of the >>> overall time is that this test is a good generic Python stress test - this >>> test might catch some other startup race condition, for example. >> >> test_decimal has a short version which tests basic functionality and always runs, but >> enabling -udecimal also runs the specification tests (which take a fair bit longer). >> >> So keeping the basic subprocess tests unconditional, and running the long ones only if -uall >> or -usubprocess are given would seem reasonable. > > does anyone ever use the -u options when running tests? Yes, occasionally. Esp. with test_compiler a testall run is an overnight job but I try to do it every now and again. Cheers, mwh -- If design space weren't so vast, and the good solutions so small a portion of it, programming would be a lot easier. -- maney, comp.lang.python From tim.peters at gmail.com Thu Feb 17 15:43:20 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:43:55 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae050217064337532915@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? From tim.peters at gmail.com Thu Feb 17 15:43:20 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:45:14 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae050217064337532915@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? From tim.peters at gmail.com Thu Feb 17 15:56:14 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:56:16 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae05021706564914b901@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are visible in the Zope world, due to auto-generated test runner failure reports; alas, this is in a new test, and 2.4 worked fine with the Zope tests as they were when 2.4 was released)? From mwh at python.net Thu Feb 17 15:55:23 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 16:15:42 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae050217064337532915@mail.gmail.com> (Tim Peters's message of "Thu, 17 Feb 2005 09:43:20 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> Message-ID: <2mfyzvz15w.fsf@starship.python.net> Tim Peters writes: > [Michael Hudson] >> ... >> Well, I fixed it on reading the bug report and before getting to >> python-dev mail :) Sorry if this duplicated your work, but hey, it was >> only a two line change... > > Na, the real work was tracking it down in the bowels of Zope's C-coded > security machinery -- we'll let you do that part next time . > > Did you add a test to ensure this remains fixed? Yup. > A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are > very visible in the Zope world, due to auto-generated test runner > failure reports)? No, I'll do that now. I'm not very good at remembering NEWS blurbs... Cheers, mwh -- 6. The code definitely is not portable - it will produce incorrect results if run from the surface of Mars. -- James Bonfield, http://www.ioccc.org/2000/rince.hint From tim.peters at gmail.com Thu Feb 17 16:17:22 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 16:17:27 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: References: <4214A712.6090107@iinet.net.au> Message-ID: <1f7befae05021707171476f540@mail.gmail.com> [Fredrik Lundh] > does anyone ever use the -u options when running tests? Yes -- I routinely do -uall, under both release and debug builds, but only on Windows. WinXP in particular seems to do a good job when hyper-threading is available -- running the tests doesn't slow down anything else I'm doing, except during the disk-intensive tests (test_largefile is a major pig on Windows). From anthony at interlink.com.au Thu Feb 17 16:24:35 2005 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Feb 17 16:25:11 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: References: <4214A712.6090107@iinet.net.au> Message-ID: <200502180224.36851.anthony@interlink.com.au> On Friday 18 February 2005 01:19, Fredrik Lundh wrote: > > does anyone ever use the -u options when running tests? I use "make testall" (which invokes with -uall) regularly, and turn on specific options when they're testing something I'm working with. -- Anthony Baxter It's never too late to have a happy childhood. From tim.peters at gmail.com Thu Feb 17 16:25:50 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 16:25:53 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mfyzvz15w.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> <2mfyzvz15w.fsf@starship.python.net> Message-ID: <1f7befae05021707252136573e@mail.gmail.com> [sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the first msg was still in "draft" status] >> Did you add a test to ensure this remains fixed? [mwh] > Yup. Bless you. Did you attach a contributor agreement and mark the test as being contributed under said contributor agreement, adjacent to your valid copyright notice ? >> A NEWS blurb ...? > No, I'll do that now. I'm not very good at remembering NEWS blurbs... LOL -- sorry, I'm just imagining what NEWS would look like if we required a contributor-agreement notification on each blurb. I appreciate your work here, and will try to find a drug to counteract the ones I appear to have overdosed on this morning ... From mwh at python.net Thu Feb 17 16:29:12 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 16:29:14 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae05021707252136573e@mail.gmail.com> (Tim Peters's message of "Thu, 17 Feb 2005 10:25:50 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> <2mfyzvz15w.fsf@starship.python.net> <1f7befae05021707252136573e@mail.gmail.com> Message-ID: <2m8y5nyzlj.fsf@starship.python.net> Tim Peters writes: > [sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the > first msg was still in "draft" status] > >>> Did you add a test to ensure this remains fixed? > > [mwh] >> Yup. > > Bless you. Did you attach a contributor agreement and mark the test > as being contributed under said contributor agreement, adjacent to > your valid copyright notice ? Fortunately 2 lines < 25 lines, so I think I'm safe on this one :) Cheers, mwh -- glyph: I don't know anything about reality. -- from Twisted.Quotes From gvanrossum at gmail.com Thu Feb 17 16:30:58 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 17 16:31:00 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > test_no_leaking as it is, but if you think otherwise... > > One thing that actually can motivate that test_subprocess takes 20% of the > overall time is that this test is a good generic Python stress test - this > test might catch some other startup race condition, for example. A suite of unit tests is a precious thing. We want to test as much as we can, and as thoroughly as possible; but at the same time we want the test to run reasonably fast. If the test takes too long, human nature being what it is, this will actually cause less thorough testing because developers don't feel like running the test suite after each small change, and then we get frequent problems where someone breaks the build because they couldn't wait to run the unit test. (For example, where I work we have a Java test suite that takes 25 minutes to run. The build is broken on a daily basis by developers (including me) who make a small change and check it in believing it won't break anything.) The Python test suite already has a way (the -u flag) to distinguish between "regular" broad-coverage testing and deep coverage for specific (or all) areas. Let's keep the really long-running tests out of the regular test suite. There used to be a farm of machines that did nothing but run the test suite ("snake-farm"). This seems to have stopped (it was run by volunteers at a Swedish university). Maybe we should revive such an effort, and make sure it runs with -u all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From astrand at lysator.liu.se Thu Feb 17 16:52:12 2005 From: astrand at lysator.liu.se (Peter Astrand) Date: Thu Feb 17 16:52:24 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: On Thu, 17 Feb 2005, Guido van Rossum wrote: > > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > > test_no_leaking as it is, but if you think otherwise... > A suite of unit tests is a precious thing. We want to test as much as > we can, and as thoroughly as possible; but at the same time we want > the test to run reasonably fast. If the test takes too long, human > nature being what it is, this will actually cause less thorough > testing because developers don't feel like running the test suite > after each small change, and then we get frequent problems where Good point. > The Python test suite already has a way (the -u flag) to distinguish > between "regular" broad-coverage testing and deep coverage for > specific (or all) areas. Let's keep the really long-running tests out > of the regular test suite. I'm convinced. Is this easy to implement? Anyone interested in doing this? > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. Yes, Snake Farm is/was a project at "Lysator", an academic computer society located at Linkoping University. As you can tell from my mail address, I'm a member as well. I haven't been involved in the Snake Farm project, though. /Peter ?strand From python at rcn.com Thu Feb 17 17:02:54 2005 From: python at rcn.com (Raymond Hettinger) Date: Thu Feb 17 17:06:54 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: Message-ID: <002301c5150a$24760de0$3bbd2c81@oemcomputer> > Let's keep the really long-running tests out > of the regular test suite. For test_subprocess, consider adopting the technique used by test_decimal. When -u decimal is not specified, a small random selection of the resource intensive tests are run. That way, all of the tests eventually get run even if no one is routinely using -u all. Raymond From skip at pobox.com Thu Feb 17 17:19:35 2005 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 17 17:17:40 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? Message-ID: <16916.50199.723442.36695@montanaro.dyndns.org> I am frantically trying to get ready to be out of town for a week of vacation. Someone sent me some patches for datetime and asked me to look at them. I begged off but referred him to http://www.python.org/dev/ and made mention of the five patch review idea. Can someone make sure that's explained on the /dev/ site? Thx, Skip From walter at livinglogic.de Thu Feb 17 17:22:25 2005 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Feb 17 17:22:28 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214C4C1.5070309@livinglogic.de> Guido van Rossum wrote: > [...] > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. I've changed the job that produces the data for http://coverage.livinglogic.de/ to run python Lib/test/regrtest.py -uall -T -N Unfortunately this job currently produces only coverage info, the output of the test suite is thrown away. It should be easy to fix this, so that the output gets put into the database. Bye, Walter D?rwald From mwh at python.net Thu Feb 17 18:11:19 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 18:11:22 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: <002301c5150a$24760de0$3bbd2c81@oemcomputer> (Raymond Hettinger's message of "Thu, 17 Feb 2005 11:02:54 -0500") References: <002301c5150a$24760de0$3bbd2c81@oemcomputer> Message-ID: <2m3bvvyuvc.fsf@starship.python.net> "Raymond Hettinger" writes: >> Let's keep the really long-running tests out >> of the regular test suite. > > For test_subprocess, consider adopting the technique used by > test_decimal. When -u decimal is not specified, a small random > selection of the resource intensive tests are run. That way, all of the > tests eventually get run even if no one is routinely using -u all. I do like this strategy but I don't think it applies to this test -- it has to try to create more than 'ulimit -n' processes, if I understand it correctly. Which makes me think there might be other ways to write the test if the resource module is available... Cheers, mwh -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From tim.peters at gmail.com Thu Feb 17 18:26:36 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 18:26:40 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: <2m3bvvyuvc.fsf@starship.python.net> References: <002301c5150a$24760de0$3bbd2c81@oemcomputer> <2m3bvvyuvc.fsf@starship.python.net> Message-ID: <1f7befae05021709266fbc542d@mail.gmail.com> [Raymond Hettinger] >> For test_subprocess, consider adopting the technique used by >> test_decimal. When -u decimal is not specified, a small random >> selection of the resource intensive tests are run. That way, all of the >> tests eventually get run even if no one is routinely using -u all. [Michael Hudson] > I do like this strategy but I don't think it applies to this test -- > it has to try to create more than 'ulimit -n' processes, if I > understand it correctly. Which makes me think there might be other > ways to write the test if the resource module is available... Aha! That explains why test_subprocess runs so much faster on Windows despite that Windows process-creation time is measured in geological eras: test_no_leaking special-cases Windows to do only 65 iterations instead of 1026. It's easy to put that under control of a -u option instead; e.g., instead of max_handles = 1026 if mswindows: max_handles = 65 just use 1026 all the time, and stuff, e.g., if not test_support.is_resource_enabled("subprocess"): return at the start of test_no_leaking(). From aahz at pythoncraft.com Thu Feb 17 18:33:46 2005 From: aahz at pythoncraft.com (Aahz) Date: Thu Feb 17 18:33:50 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <16916.50199.723442.36695@montanaro.dyndns.org> References: <16916.50199.723442.36695@montanaro.dyndns.org> Message-ID: <20050217173346.GB18117@panix.com> On Thu, Feb 17, 2005, Skip Montanaro wrote: > > I am frantically trying to get ready to be out of town for a > week of vacation. Someone sent me some patches for datetime > and asked me to look at them. I begged off but referred him to > http://www.python.org/dev/ and made mention of the five patch review > idea. Can someone make sure that's explained on the /dev/ site? This should go into Brett's survey of the Python dev process, not as official documentation. It's simply an offer made by some of the prominent members of python-dev. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From arigo at tunes.org Thu Feb 17 19:11:19 2005 From: arigo at tunes.org (Armin Rigo) Date: Thu Feb 17 19:14:50 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050214074122b715a@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> Message-ID: <20050217181119.GA3055@vicky.ecs.soton.ac.uk> Hi Tim, On Mon, Feb 14, 2005 at 10:41:35AM -0500, Tim Peters wrote: > # This is a puzzle: there's no way to know the natural width of > # addresses on this box (in particular, there's no necessary > # relation to sys.maxint). Isn't this natural width nowadays available as: 256 ** struct.calcsize('P') ? Armin From tim.peters at gmail.com Thu Feb 17 19:44:11 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 19:44:16 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <20050217181119.GA3055@vicky.ecs.soton.ac.uk> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> Message-ID: <1f7befae050217104431312214@mail.gmail.com> [Tim Peters] >> # This is a puzzle: there's no way to know the natural width of >> # addresses on this box (in particular, there's no necessary >> # relation to sys.maxint). [Armin Rigo] > Isn't this natural width nowadays available as: > > 256 ** struct.calcsize('P') > > ? Looks right to me -- cool! I never used struct's 'P' format because it always appeared useless to me: even if I could ship pointers across processes or boxes, there's not much I could do with them after getting integers back from unpack(). But silly me! I'm sure Guido put it there anticipating the need for calcsize('P') when making a positive_id() function in Python. Now if you'll just sign and fax a Zope contributor agreement, I'll upgrade ZODB to use this slick trick . From fredrik at pythonware.com Thu Feb 17 21:21:38 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 21:21:43 2005 Subject: [Python-Dev] Re: Re: string find(substring) vs. substring in string References: <000001c51473$df4717a0$8d2acb97@oemcomputer> Message-ID: Raymond Hettinger wrote: > > but refactoring the contains code to use find_internal sounds like a good > > first step. any takers? > > I'm up for it. excellent! just fyi, unless my benchmark is mistaken, the Unicode implementation has the same problem: str in -> 25.8 µsec per loop unicode in -> 26.8 µsec per loop str.find() -> 6.73 µsec per loop unicode.find() -> 7.24 µsec per loop oddly enough, if I change the target string so it doesn't contain any partial matches at all, unicode.find() wins the race: str in -> 24.5 µsec per loop unicode in -> 24.6 µsec per loop str.find() -> 2.86 µsec per loop unicode.find() -> 2.16 µsec per loop From bac at OCF.Berkeley.EDU Thu Feb 17 21:22:29 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Feb 17 21:22:44 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <20050217173346.GB18117@panix.com> References: <16916.50199.723442.36695@montanaro.dyndns.org> <20050217173346.GB18117@panix.com> Message-ID: <4214FD05.7020203@ocf.berkeley.edu> [removed pydotorg from people receiving this email] Aahz wrote: > On Thu, Feb 17, 2005, Skip Montanaro wrote: > >>I am frantically trying to get ready to be out of town for a >>week of vacation. Someone sent me some patches for datetime >>and asked me to look at them. I begged off but referred him to >>http://www.python.org/dev/ and made mention of the five patch review >>idea. Can someone make sure that's explained on the /dev/ site? > > > This should go into Brett's survey of the Python dev process, not as > official documentation. It's simply an offer made by some of the > prominent members of python-dev. I am planning on adding that blurb in there. Actually, while I have everyone's attention, I might as well throw an idea out there about sprucing up yet again the docs on contributing. I was thinking of taking the current dev intro and have it just explain how things basically work around here. So the doc would become more of just a high-level overview of how we dev the language. But I would cut out the helping out section and spin that into another doc that would go into some more detail on how to make a contribution. So this would specify in more detail how to report a bug, how to comment on one, etc. (same goes for patches). This is where I would stick the 5-for-1 deal. Lastly, write up a doc that covers what one with CVS checkin rights needs to do when checking in code. So how one goes about getting checkin rights, getting initial checkins OK'ed by others, and then the usual steps taken for a checkin. Sound worth it to people? Not really needed so go back and do your homework, Brett? What? -Brett From Jack.Jansen at cwi.nl Thu Feb 17 21:46:03 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Feb 17 21:46:03 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libimp.tex, 1.36, 1.36.2.1 libsite.tex, 1.26, 1.26.4.1 libtempfile.tex, 1.22, 1.22.4.1 libos.tex, 1.146.2.1, 1.146.2.2 In-Reply-To: References: Message-ID: On 14-feb-05, at 10:23, Just van Rossum wrote: > bcannon@users.sourceforge.net wrote: > >> \begin{datadesc}{PY_RESOURCE} >> -The module was found as a Macintosh resource. This value can only be >> -returned on a Macintosh. >> +The module was found as a Mac OS 9 resource. This value can only be >> +returned on a Mac OS 9 or earlier Macintosh. >> \end{datadesc} > > not entirely true: it's limited to the sa called "OS9" version of > MacPython, which happily runs natively on OSX as a Carbon app... But as of 2.4 there's no such thing as MacPython-OS9 any more. But as the constant is still in there I thought it best to document it. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From walter at livinglogic.de Thu Feb 17 23:22:20 2005 From: walter at livinglogic.de (=?iso-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Feb 17 23:22:22 2005 Subject: [Python-Dev] Negative indices in UserString.MutableString Message-ID: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> Currently UserString.MutableString does not support negative indices: >>> import UserString >>> UserString.MutableString("foo")[-1] = "bar" Traceback (most recent call last): File " ", line 1, in ? File "/home/Python-test/dist/src/Lib/UserString.py", line 149, in __setitem__ if index < 0 or index >= len(self.data): raise IndexError IndexError Should this be fixed so that negative value are treated as being relative to the end? Bye, Walter D?rwald From aahz at pythoncraft.com Thu Feb 17 23:23:36 2005 From: aahz at pythoncraft.com (Aahz) Date: Thu Feb 17 23:23:37 2005 Subject: [Python-Dev] Negative indices in UserString.MutableString In-Reply-To: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> References: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> Message-ID: <20050217222336.GA18285@panix.com> On Thu, Feb 17, 2005, Walter D?rwald wrote: > > Currently UserString.MutableString does not support negative indices: > > >>> import UserString > >>> UserString.MutableString("foo")[-1] = "bar" > Traceback (most recent call last): > File " ", line 1, in ? > File "/home/Python-test/dist/src/Lib/UserString.py", line 149, in __setitem__ > if index < 0 or index >= len(self.data): raise IndexError > IndexError > > Should this be fixed so that negative value are treated as being > relative to the end? Yup! As usual, patches welcome. (Yes, I'm comfortable channeling Guido here.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From greg.ewing at canterbury.ac.nz Fri Feb 18 02:58:46 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri Feb 18 02:59:05 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050217104431312214@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> <1f7befae050217104431312214@mail.gmail.com> Message-ID: <42154BD6.4030001@canterbury.ac.nz> Tim Peters wrote: > Looks right to me -- cool! I never used struct's 'P' format because > it always appeared useless to me: But silly me! I'm sure Guido > put it there anticipating the need for calcsize('P') when making a > positive_id() function in Python. Smells like more time machine activity to me. Any minute now you'll find there's suddenly a positive_id() builtin that's been there ever since 1.3 or so. And the 'P' format, then always never having just become useful, will have unappeared... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From nick at ilm.com Thu Feb 17 00:56:24 2005 From: nick at ilm.com (Nick Rasmussen) Date: Fri Feb 18 03:04:18 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5614e00fb134b968fa76a1896c456f4a@redivi.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5614e00fb134b968fa76a1896c456f4a@redivi.com> Message-ID: <20050216235624.GO17806@ewok.lucasdigital.com> On Wed, 16 Feb 2005, Bob Ippolito wrote: > > On Feb 16, 2005, at 11:02, Phillip J. Eby wrote: > > >At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: > >>tommy said that this would be the best place to ask > >>this question.... > >> > >>I'm trying to get functions wrapped via boost to show > >>up as builtin types so that pydoc includes them when > >>documenting the module containing them. Right now > >>boost python functions are created using a PyTypeObject > >>such that when inspect.isbuiltin does: > >> > >> return isinstance(object, types.BuiltinFunctionType) > > > >FYI, this may not be the "right" way to do this, but since 2.3 > >'isinstance()' looks at an object's __class__ rather than its type(), > >so you could perhaps include a '__class__' descriptor in your method > >type that returns BuiltinFunctionType and see if that works. > > > >It's a kludge, but it might let your code work with existing versions > >of Python. > > It works in Python 2.3.0: > That seemed to do the trick for me as well, I'll run it past the boost::python folks and see what they think. many thanks -nick From maalanen at ra.abo.fi Thu Feb 17 17:30:27 2005 From: maalanen at ra.abo.fi (Marcus Alanen) Date: Fri Feb 18 03:04:20 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214C6A3.1000806@ra.abo.fi> Guido van Rossum wrote: > The Python test suite already has a way (the -u flag) to distinguish > between "regular" broad-coverage testing and deep coverage for > specific (or all) areas. Let's keep the really long-running tests out > of the regular test suite. > > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. Hello Guido and everybody else, I hacked together a simple distributed unittest runner for our projects. Requirements are a NFS-mounted home directory across the slave nodes and SSH-based "automatic" authentication, i.e. no passwords or passphrases necessary. It officially works-for-me for around three hosts (see below) so that cuts the time down basically to a third (real-life example ~600 seconds to ~200 seconds, so it does work :-). It also supports "serialized tests", i.e. tests that must be run one after the other and cannot be run in parallel. http://mde.abo.fi/tools/disttest/ Comes with some problems; my blurb from advogato.org: """ Disttest is a distributed unittesting runner. You simply set the DISTTEST_HOSTS variable to a space-separated list of hostnames to connect to using SSH, and then run "disttest". The nodes must all have the same filesystem (usually an NFS-mounted /home) and have the Disttest program installed. You even gain a bit with just one computer by setting the variable to "localhost localhost". :-) There are currently two annoying problem with it, though. For some reason, 1) the unittest program connecting to the X server sometimes fails to provide the correct authentication, and 2) sometimes the actual connection to the X server can't be established. I think these are related to 1) congestion on the shared .Xauthority file, and 2) a too small listen() queue on the forwarding port by the SSH daemon. Both problems show up when using too many (over 4?) hosts, which is the whole point of the program! Sigh. """ Error checking probably bad. Anyway, feel free to check it out, modify, comment or anything. We're thinking of checking the assumptions in the blurb above, but no timetable is set. My guess is that the NFS-mounted home directory is the showstopper and people usually don't have lot's of machines hanging around, but that's for you to decide. Disclaimer: I don't know anything of CPython development nor of the tests in the CPython test suite. ;-) Best regards, and a big thank you for Python, Marcus From Martin.Gfeller at comit.ch Thu Feb 17 19:34:50 2005 From: Martin.Gfeller at comit.ch (Gfeller Martin) Date: Fri Feb 18 03:04:22 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% Message-ID: Hi, what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn is heavily used by the application). The lists also get fairly large, although not huge - up to typically 50000 (complex) objects in the tests I've measured. As I said, I don't speak C, so I can only speculate - do the lists at some point grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a higher upper limit, if I understood Tim Peters correctly)? Best regards, Martin -----Original Message----- From: Evan Jones [mailto:ejones@uwaterloo.ca] Sent: Thursday, 17 Feb 2005 02:26 To: Python Dev Cc: Gfeller Martin; Martin v. L?wis Subject: Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% On Feb 16, 2005, at 18:42, Martin v. L?wis wrote: > I must admit that I'm surprised. I would have expected > that most allocations in Python go through obmalloc, so > the heap would only see "large" allocations. > > It would be interesting to find out, in your application, > why it is still an improvement to use the low-fragmentation > heaps. Hmm... This is an excellent point. A grep through the Python source code shows that the following files call the native system malloc (I've excluded a few obviously platform specific files). A quick visual inspection shows that most of these are using it to allocate some sort of array or string, so it likely *should* go through the system malloc. Gfeller, any idea if you are using any of the modules on this list? If so, it would be pretty easy to try converting them to call the obmalloc functions instead, and see how that affects the performance. Evan Jones Demo/pysvr/pysvr.c Modules/_bsddb.c Modules/_curses_panel.c Modules/_cursesmodule.c Modules/_hotshot.c Modules/_sre.c Modules/audioop.c Modules/bsddbmodule.c Modules/cPickle.c Modules/cStringIO.c Modules/getaddrinfo.c Modules/main.c Modules/pyexpat.c Modules/readline.c Modules/regexpr.c Modules/rgbimgmodule.c Modules/svmodule.c Modules/timemodule.c Modules/zlibmodule.c PC/getpathp.c Python/strdup.c Python/thread.c From tim.peters at gmail.com Fri Feb 18 04:38:08 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 18 04:38:14 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <1f7befae050217193863ffc028@mail.gmail.com> [Gfeller Martin] > what immediately comes to mind are Modules/cPickle.c and > Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn > is heavily used by the application). I probably guessed right the first time : LFH doesn't help with the lists directly, but helps indirectly by keeping smaller objects out of the general heap where the list guts actually live. Say we have a general heap with a memory map like this, meaning a contiguous range of available memory, where 'f' means a block is free. The units of the block don't really matter, maybe one 'f' is one byte, maybe one 'f' is 4MB -- it's all the same in the end: fffffffffffffffffffffffffffffffffffffffffffffff Now you allocate a relatively big object (like the guts of a large list), and it's assigned a contiguous range of blocks marked 'b': bbbbbbbbbbbbbbbffffffffffffffffffffffffffffffff Then you allocate a small object, marked 's': bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff The you want to grow the big object. Oops! It can't extend the block of b's in-place, because 's' is in the way. Instead it has to copy the whole darn thing: fffffffffffffffsbbbbbbbbbbbbbbbffffffffffffffff But if 's' is allocated from some _other_ heap, then the big object can grow in-place, and that's much more efficient than copying the whole thing. obmalloc has two primary effects: it manages a large number of very small (<= 256 bytes) memory chunks very efficiently, but it _also_ helps larger objects indirectly, by keeping the very small objects out of the platform C malloc's way. LFH appears to be an extension of the same basic idea, raising the "small object" limit to 16KB. Now note that pymalloc and LFH are *bad* ideas for objects that want to grow. pymalloc and LFH segregate the memory they manage into blocks of different sizes. For example, pymalloc keeps a list of free blocks each of which is exactly 64 bytes long. Taking a 64-byte block out of that list, or putting it back in, is very efficient. But if an object that uses a 64-byte block wants to grow, pymalloc can _never_ grow it in-place, it always has to copy it. That's a cost that comes with segregating memory by size, and for that reason Python deliberately doesn't use pymalloc in several cases where objects are expected to grow over time. One thing to take from that is that LFH can't be helping list-growing in a direct way either, if LFH (as seems likely) also needs to copy objects that grow in order to keep its internal memory segregated by size. The indirect benefit is still available, though: LFH may be helping simply by keeping smaller objects out of the general heap's hair. > The lists also get fairly large, although not huge - up to typically 50000 > (complex) objects in the tests I've measured. That's much larger than LFH can handle. Its limit is 16KB. A Python list with 50K elements requires a contiguous chunk of 200KB on a 32-bit machine to hold the list guts. > As I said, I don't speak C, so I can only speculate - do the lists at some point >grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a > higher upper limit, if I understood Tim Peters correctly)? A Python list object comprises two separately allocated pieces of memory. First is a list header, a small piece of memory of fixed size, independent of len(list). The list header is always obtained from obmalloc; LFH will never be involved with that, and neither will the system malloc. The list header has a pointer to a separate piece of memory, which contains the guts of a list, a contiguous vector of len(list) pionters (to Python objects). For a list of length n, this needs 4*n bytes on a 32-bit box. obmalloc never manages that space, and for the reason given above: we expect that list guts may grow, and obmalloc is meant for fixed-size chunks of memory. So the list guts will get handled by LFH, until the list needs more than 4K entries (hitting the 16KB LFH limit). Until then, LFH probably wastes time by copying growing list guts from size class to size class. Then the list guts finally get copied to the general heap, and stay there. I'm afraid the only you can know for sure is by obtaining detailed memory maps and analyzing them. From abo at minkirri.apana.org.au Fri Feb 18 05:09:51 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 18 05:10:35 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050217065330.GP25441@zot.electricrain.com> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: <1108699791.3758.98.camel@schizo> On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote: > fyi - i've updated the python sha1/md5 openssl patch. it now replaces > the entire sha and md5 modules with a generic hashes module that gives > access to all of the hash algorithms supported by OpenSSL (including > appropriate legacy interface wrappers and falling back to the old code > when compiled without openssl). > > https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 > > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. I just had a quick look, and have these comments (psedo patch review?). Apologies for the noise on the list... DESCRIPTION =========== This patch keeps the current md5c.c, md5module.c files and adds the following; _hashopenssl.c, hashes.py, md5.py, sha.py. The old md5 and sha extension modules get replaced by hashes.py, md5.py, and sha.py python modules that leverage off _hash (openssl) or _md5 and _sha (no openssl) extension modules. The new _hash extension module "wraps" the high level openssl EVP interface, which uses a string parameter to indicate what type of message digest algorithm to use. The advantage of this is it makes all openssl supported digests available, and if openssl adds more, we get them for free. A disadvantage of this is it is an abstraction level above the actual md5 and sha implementations, and this may add overheads. These overheads are probably negligible compared to the actual implementation speedups. The new _md5 and _sha extension modules are simply re-named versions of the old md5 and sha modules. The hashes.py module acts as an import wrapper for _hash, and falls back to using _md5 and _sha modules if _hash is not available. It provides an EVP style API (string hash name parameter), that supports only md5 and sha hashes if openssl is not available. The new md5.py and sha.py modules simply use hash.py. COMMENTS ======== The introduction of a "hashes" module with a new API that supports many different digests (provided openssl is available) is extending Python, not just "fixing the licenses" of md5 and sha modules. If all we wanted to do was fix the md5 module, a simpler solution would be to change the md5c.c API to match openssl's implementation, and make md5module.c use it, conditionally compiling against md5c.c or linking against openssl in setup.py. A similar approach could be used for sha, but would require stripping the sha implementation out of shamodule.c I am mildly of concerned about the namespace/filespace clutter introduced by this implementation... it feels unnecessary, as does the tangled dependencies between them. With openssl, hashes.py duplicates the functionality of _hash. Without openssl, md5.py and sha.py duplicate _md5 and _sha, via a roundabout route through hash.py. The python wrappers seem overly complicated, with things like def new(name, string=None): if string: return _hash.new(name) else: return _hash.new.(name,string) being common where the following would suffice; def new(name,string=""): return _hash.new(name,string) I think this is because _hash.new() uses an optional string parameter, but I have a feeling a C update with a zero length string is faster than this Python if. If it was a concern, the C implementation could check the value of the string length before calling update. Given the convenience methods for different hashes in hashes.py (which incidentally look like they are only available when _hash is not available... something else that needs fixing), the md5.py module could be simply coded as; from hashes import md5 new = md5 Despite all these nit-picks, it looks pretty good. It is orders of magnitude better than any of the other non-existent solutions, including the one I didn't code :-) -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From raymond.hettinger at verizon.net Fri Feb 18 07:53:37 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Feb 18 07:57:43 2005 Subject: [Python-Dev] Prospective Peephole Transformation Message-ID: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Based on some ideas from Skip, I had tried transforming the likes of "x in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it substantially simplified the generated code and converted the O(n) lookup into an O(1) step. There were substantial savings even if the set contained only a single entry. When disassembled, the bytecode is not only much shorter, it is also much more readable (corresponding almost directly to the original source). The problem with the transformation was that it didn't handle the case where x was non-hashable and it would raise a TypeError instead of returning False as it should. That situation arose once in the email module's test suite. To get it to work, I would have to introduce a frozenset subtype: class Searchset(frozenset): def __contains__(self, element): try: return frozenset.__contains__(self, element) except TypeError: return False Then, the transformation would be "x in Searchset([1, 2, 3])". Since the new Searchset object goes in the constant table, marshal would have to be taught how to save and restore the object. This is a more complicated than the original frozenset version of the patch, so I would like to get feedback on whether you guys think it is worth it. Raymond Hettinger From fredrik at pythonware.com Fri Feb 18 09:18:31 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 09:18:40 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: Raymond Hettinger wrote: > Based on some ideas from Skip, I had tried transforming the likes of "x > in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it > substantially simplified the generated code and converted the O(n) > lookup into an O(1) step. There were substantial savings even if the > set contained only a single entry. savings in what? time or bytecode size? constructed micro-benchmarks, or examples from real-life code? do we have any statistics on real-life "n" values? From martin at v.loewis.de Fri Feb 18 10:06:24 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri Feb 18 10:06:28 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108699791.3758.98.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> Message-ID: <4215B010.2090600@v.loewis.de> Donovan Baarda wrote: > This patch keeps the current md5c.c, md5module.c files and adds the > following; _hashopenssl.c, hashes.py, md5.py, sha.py. [...] > If all we wanted to do was fix the md5 module If we want to fix the licensing issues with the md5 module, this patch does not help at all, as it keeps the current md5 module (along with its licensing issues). So any patch to solve the problem will need to delete the code with the questionable license. Then, the approach in the patch breaks the promise that the md5 module is always there. It would require that OpenSSL is always there - a promise that we cannot make (IMO). Regards, Martin From arigo at tunes.org Fri Feb 18 12:36:08 2005 From: arigo at tunes.org (Armin Rigo) Date: Fri Feb 18 12:39:37 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050217104431312214@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> <1f7befae050217104431312214@mail.gmail.com> Message-ID: <20050218113608.GB25496@vicky.ecs.soton.ac.uk> Hi Tim, On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote: > > 256 ** struct.calcsize('P') > > Now if you'll just sign and fax a Zope contributor agreement, I'll > upgrade ZODB to use this slick trick . I hereby donate this line of code to the public domain :-) Armin From skip at pobox.com Fri Feb 18 15:41:42 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 15:39:15 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <20050217173346.GB18117@panix.com> References: <16916.50199.723442.36695@montanaro.dyndns.org> <20050217173346.GB18117@panix.com> Message-ID: <16917.65190.515241.199460@montanaro.dyndns.org> aahz> This should go into Brett's survey of the Python dev process, not aahz> as official documentation. It's simply an offer made by some of aahz> the prominent members of python-dev. As long as it's referred to from www.python.org/dev that's fine. Skip From skip at pobox.com Fri Feb 18 15:57:39 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 15:55:29 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <16918.611.903084.183700@montanaro.dyndns.org> >> Based on some ideas from Skip, I had tried transforming the likes of >> "x in (1,2,3)" into "x in frozenset([1,2,3])".... Fredrik> savings in what? time or bytecode size? constructed Fredrik> micro-benchmarks, or examples from real-life code? Fredrik> do we have any statistics on real-life "n" values? My original suggestion wasn't based on performance issues. It was based on the notion of tuples-as-records and lists-as-arrays. Raymond had originally gone through the code and changed for x in [1,2,3]: to for x in (1,2,3): I suggested that since the standard library code is commonly used as an example of basic Python principles (that's probably not the right word), it should uphold that ideal tuple/list distinction. Raymond then translated for x in [1,2,3]: to for x in frozenset([1,2,3]): I'm unclear why the list in "for x in [1,2,3]" or "if x not in [1,2,3]" can't fairly easily be recognized as a constant and just be placed in the constants array. The bytecode would show n LOAD_CONST opcodes followed by BUILD_LIST then either a COMPARE_OP (in the test case) or GET_ITER+FOR_ITER (in the for loop case). I think the optimizer should be able to recognize both constructs fairly easily. I don't know if that would provide a performance increase or not. I was after separation of functionality between tuples and lists. Skip From python at rcn.com Fri Feb 18 15:58:10 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 16:02:09 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <16918.611.903084.183700@montanaro.dyndns.org> Message-ID: <000001c515ca$4378e260$803cc797@oemcomputer> > I'm unclear why the list in "for x in [1,2,3]" or "if x not in [1,2,3]" > can't fairly easily be recognized as a constant and just be placed in the > constants array. That part got done (at least for the if-statement). The question is whether the type transformation idea should be carried a step further so that a single step search operation replaces the linear search. Raymond From irmen at xs4all.nl Fri Feb 18 15:36:15 2005 From: irmen at xs4all.nl (Irmen de Jong) Date: Fri Feb 18 16:02:14 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <16918.611.903084.183700@montanaro.dyndns.org> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> <16918.611.903084.183700@montanaro.dyndns.org> Message-ID: <4215FD5F.4040605@xs4all.nl> Skip Montanaro wrote: > I suggested that since the standard library code is commonly used as an > example of basic Python principles (that's probably not the right word), it > should uphold that ideal tuple/list distinction. Raymond then translated > > for x in [1,2,3]: > > to > > for x in frozenset([1,2,3]): I may be missing something here (didn't follow the whole thread) but those two are not functionally equal. The docstring on frozenset sais "Build an immutable unordered collection." So there's no guarantee that the elements will return from the frozenset iterator in the order that you constructed the frozenset with, right? --Irmen From python at rcn.com Fri Feb 18 16:15:04 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 16:19:03 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <4215FD5F.4040605@xs4all.nl> Message-ID: <000101c515cc$9f96d0a0$803cc797@oemcomputer> > > Raymond then > translated > > > > for x in [1,2,3]: > > > > to > > > > for x in frozenset([1,2,3]): That's not right. for-statements are not touched. > I may be missing something here (didn't follow the whole thread) but > those two are not functionally equal. > The docstring on frozenset sais "Build an immutable unordered collection." > So there's no guarantee that the elements will return from the > frozenset iterator in the order that you constructed the frozenset with, > right? Only contains expressions are translated: "if x in [1,2,3]" currently turns into: "if x in (1,2,3)" and I'm proposing that it go one step further: "if x in Seachset([1,2,3])" where Search set is a frozenset subtype that doesn't require x to be hashable. Also, the transformation would only happen when the contents of the search are all constants. Raymond From pje at telecommunity.com Fri Feb 18 16:36:43 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 16:34:03 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <000101c515cc$9f96d0a0$803cc797@oemcomputer> References: <4215FD5F.4040605@xs4all.nl> Message-ID: <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> At 10:15 AM 2/18/05 -0500, Raymond Hettinger wrote: >Only contains expressions are translated: > > "if x in [1,2,3]" > >currently turns into: > > "if x in (1,2,3)" > >and I'm proposing that it go one step further: > > "if x in Seachset([1,2,3])" ISTM that whenever I use a constant in-list like that, it's almost always with just a few (<4) items, so it doesn't seem worth the extra effort (especially disrupting the marshal module) just to squeeze out those extra two comparisons and replace them with a hashing operation. From fredrik at pythonware.com Fri Feb 18 16:45:32 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 16:45:45 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: >>Only contains expressions are translated: >> >> "if x in [1,2,3]" >> >>currently turns into: >> >> "if x in (1,2,3)" >> >>and I'm proposing that it go one step further: >> >> "if x in Seachset([1,2,3])" > > ISTM that whenever I use a constant in-list like that, it's almost always with just a few (<4) > items, so it doesn't seem worth the extra effort (especially disrupting the marshal module) just > to squeeze out those extra two comparisons and replace them with a hashing operation. it could be worth expanding them to "if x == 1 or x == 2 or x == 3:" though... C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" 10000000 loops, best of 3: 0.11 usec per loop C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.0691 usec per loop C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.123 usec per loop C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" 10000000 loops, best of 3: 0.143 usec per loop C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.187 usec per loop C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.197 usec per loop C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.225 usec per loop C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.161 usec per loop From skip at pobox.com Fri Feb 18 17:03:28 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 17:00:59 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <000101c515cc$9f96d0a0$803cc797@oemcomputer> References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> Message-ID: <16918.4560.171364.66303@montanaro.dyndns.org> >> > Raymond then >> translated >> > >> > for x in [1,2,3]: >> > >> > to >> > >> > for x in frozenset([1,2,3]): Raymond> That's not right. for-statements are not touched. Thanks for the correction. My apologies for the misstep. Skip From pje at telecommunity.com Fri Feb 18 17:42:51 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 17:40:12 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> At 04:45 PM 2/18/05 +0100, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > >>Only contains expressions are translated: > >> > >> "if x in [1,2,3]" > >> > >>currently turns into: > >> > >> "if x in (1,2,3)" > >> > >>and I'm proposing that it go one step further: > >> > >> "if x in Seachset([1,2,3])" > > > > ISTM that whenever I use a constant in-list like that, it's almost > always with just a few (<4) > > items, so it doesn't seem worth the extra effort (especially disrupting > the marshal module) just > > to squeeze out those extra two comparisons and replace them with a > hashing operation. > >it could be worth expanding them to > > "if x == 1 or x == 2 or x == 3:" > >though... > >C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" >10000000 loops, best of 3: 0.11 usec per loop >C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.0691 usec per loop > >C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.123 usec per loop >C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" >10000000 loops, best of 3: 0.143 usec per loop > >C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.187 usec per loop >C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" >1000000 loops, best of 3: 0.197 usec per loop > >C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" >1000000 loops, best of 3: 0.225 usec per loop >C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.161 usec per loop > > Were these timings done with the code that turns (1,2,3) into a constant? Also, I presume that these timings still include extra LOAD_FAST operations that could be replaced with DUP_TOP in the actual expansion, although I don't know how much difference that would make in practice, since saving the argument fetch might be offset by the need to swap and pop at the end. From fredrik at pythonware.com Fri Feb 18 17:52:08 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 17:52:16 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: > Were these timings done with the code that turns (1,2,3) into a constant? I used a stock 2.4 from python.org, which seems to do this (for tuples, not for lists). > Also, I presume that these timings still include extra LOAD_FAST operations that could be replaced > with DUP_TOP in the actual expansion, although I don't know how much difference that would make in > practice, since saving the argument fetch might be offset by the need to swap and pop at the end. here's the disassembly: >>> dis.dis(compile("if a in (1, 2, 3): pass", "", "exec")) 1 0 LOAD_NAME 0 (a) 3 LOAD_CONST 4 ((1, 2, 3)) 6 COMPARE_OP 6 (in) 9 JUMP_IF_FALSE 4 (to 16) 12 POP_TOP 13 JUMP_FORWARD 1 (to 17) >> 16 POP_TOP >> 17 LOAD_CONST 3 (None) 20 RETURN_VALUE >>> dis.dis(compile("if a == 1 or a == 2 or a == 3: pass", "", "exec")) 1 0 LOAD_NAME 0 (a) 3 LOAD_CONST 0 (1) 6 COMPARE_OP 2 (==) 9 JUMP_IF_TRUE 26 (to 38) 12 POP_TOP 13 LOAD_NAME 0 (a) 16 LOAD_CONST 1 (2) 19 COMPARE_OP 2 (==) 22 JUMP_IF_TRUE 13 (to 38) 25 POP_TOP 26 LOAD_NAME 0 (a) 29 LOAD_CONST 2 (3) 32 COMPARE_OP 2 (==) 35 JUMP_IF_FALSE 4 (to 42) >> 38 POP_TOP 39 JUMP_FORWARD 1 (to 43) >> 42 POP_TOP >> 43 LOAD_CONST 3 (None) 46 RETURN_VALUE From pje at telecommunity.com Fri Feb 18 18:09:29 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 18:06:50 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> At 05:52 PM 2/18/05 +0100, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > > Were these timings done with the code that turns (1,2,3) into a constant? > >I used a stock 2.4 from python.org, which seems to do this (for tuples, >not for lists). > > > Also, I presume that these timings still include extra LOAD_FAST > operations that could be replaced > > with DUP_TOP in the actual expansion, although I don't know how much > difference that would make in > > practice, since saving the argument fetch might be offset by the need > to swap and pop at the end. > >here's the disassembly: FYI, that's not a dissassembly of what timeit was actually timing; see 'template' in timeit.py. As a practical matter, the only difference would probably be the use of LOAD_FAST instead of LOAD_NAME, as timeit runs the code in a function body. But whatever. Still, it's rather interesting that tuple.__contains__ appears slower than a series of LOAD_CONST and "==" operations, considering that the tuple should be doing basically the same thing, only without bytecode fetch-and-decode overhead. Maybe it's tuple.__contains__ that needs optimizing here? From fredrik at pythonware.com Fri Feb 18 18:12:50 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 18:12:40 2005 Subject: [Python-Dev] Re: Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: >>here's the disassembly: > > FYI, that's not a dissassembly of what timeit was actually timing; see 'template' in timeit.py. > As a practical matter, the only difference would probably be the use of LOAD_FAST instead of > LOAD_NAME, as > timeit runs the code in a function body. >>> def f1(a): ... if a in (1, 2, 3): ... pass ... >>> def f2(a): ... if a == 1 or a == 2 or a == 3: ... pass ... >>> dis.dis(f1) 2 0 LOAD_FAST 0 (a) 3 LOAD_CONST 4 ((1, 2, 3)) 6 COMPARE_OP 6 (in) 9 JUMP_IF_FALSE 4 (to 16) 12 POP_TOP 3 13 JUMP_FORWARD 1 (to 17) >> 16 POP_TOP >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE >>> >>> dis.dis(f2) 2 0 LOAD_FAST 0 (a) 3 LOAD_CONST 1 (1) 6 COMPARE_OP 2 (==) 9 JUMP_IF_TRUE 26 (to 38) 12 POP_TOP 13 LOAD_FAST 0 (a) 16 LOAD_CONST 2 (2) 19 COMPARE_OP 2 (==) 22 JUMP_IF_TRUE 13 (to 38) 25 POP_TOP 26 LOAD_FAST 0 (a) 29 LOAD_CONST 3 (3) 32 COMPARE_OP 2 (==) 35 JUMP_IF_FALSE 4 (to 42) >> 38 POP_TOP 3 39 JUMP_FORWARD 1 (to 43) >> 42 POP_TOP >> 43 LOAD_CONST 0 (None) 46 RETURN_VALUE > Still, it's rather interesting that tuple.__contains__ appears slower than a series of LOAD_CONST > and "==" operations, considering that the tuple should be doing basically the same thing, only > without bytecode fetch-and-decode overhead. Maybe it's tuple.__contains__ that needs optimizing > here? wouldn't be the first time... From jimjjewett at gmail.com Fri Feb 18 20:10:05 2005 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri Feb 18 20:10:09 2005 Subject: [Python-Dev] Prospective Peephole Transformation Message-ID: Raymond Hettinger: > tried transforming the likes of "x in (1,2,3)" into "x in frozenset([1,2,3])". >... There were substantial savings even if the set contained only a single entry. >... where x was non-hashable and it would raise a TypeError instead of > returning False as it should. I read the objection as saying that it should not return False, because an unhashable object might pretend it is equal to a hashable one in the set. """ class Searchset(frozenset): def __contains__(self, element): try: return frozenset.__contains__(self, element) except TypeError: return False """ So instead of return False it should be return x in frozenset.__iter__() This would be a net loss if there were many unhashable x. You could restrict the iteration to x that implement a custom __eq__, if you ensured that none of the SearchSet elements do... but it starts to get uglier and less general. Raymond has already look at http://www.python.org/sf/1141428, which contains some test case patches to enforce this implicit "sequences always use __eq__; only mappings can short-circuit on __hash__" contract. -jJ From mal at egenix.com Fri Feb 18 21:57:16 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Fri Feb 18 21:57:22 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <000c01c51586$92c7dd60$3a01a044@oemcomputer> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <421656AC.6010602@egenix.com> Raymond Hettinger wrote: > Based on some ideas from Skip, I had tried transforming the likes of "x > in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it > substantially simplified the generated code and converted the O(n) > lookup into an O(1) step. There were substantial savings even if the > set contained only a single entry. When disassembled, the bytecode is > not only much shorter, it is also much more readable (corresponding > almost directly to the original source). > > The problem with the transformation was that it didn't handle the case > where x was non-hashable and it would raise a TypeError instead of > returning False as it should. That situation arose once in the email > module's test suite. > > To get it to work, I would have to introduce a frozenset subtype: > > class Searchset(frozenset): > def __contains__(self, element): > try: > return frozenset.__contains__(self, element) > except TypeError: > return False > > Then, the transformation would be "x in Searchset([1, 2, 3])". Since > the new Searchset object goes in the constant table, marshal would have > to be taught how to save and restore the object. > > This is a more complicated than the original frozenset version of the > patch, so I would like to get feedback on whether you guys think it is > worth it. Wouldn't it help a lot more if the compiler would detect that (1,2,3) is immutable and convert it into a constant at compile time ?! The next step would then be to have Python roll out these loops (in -O mode). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 18 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From oliphant at ee.byu.edu Fri Feb 18 22:12:53 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 22:12:56 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used Message-ID: <42165A55.3000609@ee.byu.edu> Hello again, There is a great discussion going on the numpy list regarding a proposed PEP for multidimensional arrays that is in the works. During this discussion as resurfaced regarding slicing with objects that are not IntegerType objects but that have a tp_as_number->nb_int method to convert to an int. Would it be possible to change _PyEval_SliceIndex in ceval.c so that rather than throwing an error if the indexing object is not an integer, the code first checks to see if the object has a tp_as_number->nb_int method and calls it instead. If this is acceptable, it is an easy patch. Thanks, -Travis Oliphant From gvanrossum at gmail.com Fri Feb 18 22:28:34 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri Feb 18 22:28:39 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42165A55.3000609@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> Message-ID: > Would it be possible to change > > _PyEval_SliceIndex in ceval.c > > so that rather than throwing an error if the indexing object is not an > integer, the code first checks to see if the object has a > tp_as_number->nb_int method and calls it instead. I don't think this is the right solution; since float has that method, it would allow floats to be used as slice indices, but that's not supposed to work (to protect yourself against irreproducible results due to rounding errors). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bac at OCF.Berkeley.EDU Fri Feb 18 22:31:47 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Feb 18 22:31:58 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42165A55.3000609@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> Message-ID: <42165EC3.6010209@ocf.berkeley.edu> Travis Oliphant wrote: > Hello again, > > There is a great discussion going on the numpy list regarding a proposed > PEP for multidimensional arrays that is in the works. > > During this discussion as resurfaced regarding slicing with objects that > are not IntegerType objects but that > have a tp_as_number->nb_int method to convert to an int. > Would it be possible to change > > _PyEval_SliceIndex in ceval.c > > so that rather than throwing an error if the indexing object is not an > integer, the code first checks to see if the object has a > tp_as_number->nb_int method and calls it instead. > You would also have to change apply_slice() since that also has a guard for checking the slice arguments are either NULL, int, or long objects. But I am +1 with it since the guard is already there for ints and longs to handle those properly and thus the common case does not slow down in any way. As long as it also accepts Python objects that define __int__ and not just C types that have the nb_int slot defined I am okay with this idea. -Brett From oliphant at ee.byu.edu Fri Feb 18 22:35:43 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 22:35:46 2005 Subject: [Python-Dev] Fix _PyEval_SliceIndex (Take two) Message-ID: <42165FAF.8080703@ee.byu.edu> (More readable second paragraph) Hello again, There is a great discussion going on the numpy list regarding a proposed PEP for multidimensional arrays that is in the works. During this discussion a problem has resurfaced regarding slicing with objects that are not IntegerType objects but that have a tp_as_number->nb_int method. Would it be possible to change _PyEval_SliceIndex in ceval.c so that rather than raising an exception if the indexing object is not an integer, the code first checks to see if the object has a tp_as_number->nb_int method and trys it before raising an exception. If this is acceptable, it is an easy patch. Thanks, -Travis Oliphant From david.ascher at gmail.com Fri Feb 18 22:36:31 2005 From: david.ascher at gmail.com (David Ascher) Date: Fri Feb 18 22:36:34 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: On Fri, 18 Feb 2005 13:28:34 -0800, Guido van Rossum wrote: > > Would it be possible to change > > > > _PyEval_SliceIndex in ceval.c > > > > so that rather than throwing an error if the indexing object is not an > > integer, the code first checks to see if the object has a > > tp_as_number->nb_int method and calls it instead. > > I don't think this is the right solution; since float has that method, > it would allow floats to be used as slice indices, but that's not > supposed to work (to protect yourself against irreproducible results > due to rounding errors). I wonder if floats are the special case here, not "integer like objects". I've never been particularly happy about the confusion between the two roles of int() and it's C equivalents, i.e. casting and conversion. From gvanrossum at gmail.com Fri Feb 18 22:48:16 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri Feb 18 22:48:55 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: [Travis] > > > Would it be possible to change > > > > > > _PyEval_SliceIndex in ceval.c > > > > > > so that rather than throwing an error if the indexing object is not an > > > integer, the code first checks to see if the object has a > > > tp_as_number->nb_int method and calls it instead. [Guido] > > I don't think this is the right solution; since float has that method, > > it would allow floats to be used as slice indices, but that's not > > supposed to work (to protect yourself against irreproducible results > > due to rounding errors). [David] > I wonder if floats are the special case here, not "integer like objects". > > I've never been particularly happy about the confusion between the two > roles of int() and it's C equivalents, i.e. casting and conversion. You're right, that's the crux of the matter; I unfortunately copied a design mistake from C here. In Python 3000 I'd like to change this so that floats have a __trunc__() method to return an integer (invokable via trunc(x)). But in Python 2.x, we can't be sure that floats are the *only* exception -- surely people who are implementing their own "float-like" classes are copying float's example and implementing __int__ to mean the same thing. For example, the new decimal class in Python 2.4 has a converting/truncating __int__ method. (And despite being decimal, it's no less approximate than float; decimal is *not* an exact numerical type.) So I still think it's unsafe (in Python 2.x) to accept __int__ in the way Travis proposes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bob at redivi.com Fri Feb 18 22:54:25 2005 From: bob at redivi.com (Bob Ippolito) Date: Fri Feb 18 22:54:28 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: <80cd5d26efaff4232b909b0567fb5ea3@redivi.com> On Feb 18, 2005, at 4:36 PM, David Ascher wrote: > On Fri, 18 Feb 2005 13:28:34 -0800, Guido van Rossum > wrote: >>> Would it be possible to change >>> >>> _PyEval_SliceIndex in ceval.c >>> >>> so that rather than throwing an error if the indexing object is not >>> an >>> integer, the code first checks to see if the object has a >>> tp_as_number->nb_int method and calls it instead. >> >> I don't think this is the right solution; since float has that method, >> it would allow floats to be used as slice indices, but that's not >> supposed to work (to protect yourself against irreproducible results >> due to rounding errors). > > I wonder if floats are the special case here, not "integer like > objects". > > I've never been particularly happy about the confusion between the two > roles of int() and it's C equivalents, i.e. casting and conversion. All of the __special__ methods for this purpose seem to be usable only for conversion, not casting (__str__, __unicode__, etc.). The only way I've found to pass for a particular value type is to subclass one. We do this a lot in PyObjC. It ends up being a net win anyway, because you get free implementations of all the relevant methods, at the expense of having two copies of the value. The fact that these proxy objects are no longer visible-from-Python subclasses of Objective-C objects isn't really a big deal in our case, because the canonical Objective-C way to checking inheritance still work. The wrapper types use an attribute protocol for casting (__pyobjc_object__), and delegate to this object with __getattr__. >>> from Foundation import * >>> one = NSNumber.numberWithInt_(1) >>> type(one).mro() [ , , ] >>> isinstance(one, NSNumber) False >>> isinstance(one.__pyobjc_object__, NSNumber) True >>> one.isKindOfClass_(NSNumber) 1 >>> type(one) >>> type(one.__pyobjc_object__) -bob From ejones at uwaterloo.ca Fri Feb 18 22:58:36 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Fri Feb 18 22:59:25 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <1f7befae050217193863ffc028@mail.gmail.com> References: <1f7befae050217193863ffc028@mail.gmail.com> Message-ID: On Thu, 2005-02-17 at 22:38, Tim Peters wrote: > Then you allocate a small object, marked 's': > > bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff Isn't the whole point of obmalloc is that we don't want to allocate "s" on the heap, since it is small? I guess "s" could be an object that might potentially grow. > One thing to take from that is that LFH can't be helping list-growing > in a direct way either, if LFH (as seems likely) also needs to copy > objects that grow in order to keep its internal memory segregated by > size. The indirect benefit is still available, though: LFH may be > helping simply by keeping smaller objects out of the general heap's > hair. So then wouldn't this mean that there would have to be some sort of small object being allocated via the system malloc that is causing the poor behaviour? As you mention, I wouldn't think it would be list objects, since resizing lists using LFH should be *worse*. That would actually be something that is worth verifying, however. It could be that the Windows LFH is extra clever? > I'm afraid the only you can know for sure is by obtaining detailed > memory maps and analyzing them. Well, it would also be useful to find out what code is calling the system malloc. This would make it easy to examine the code and see if it should be calling obmalloc or the system malloc. Any good ideas for easily obtaining this information? I imagine that some profilers must be able to produce a complete call graph? Evan Jones From ejones at uwaterloo.ca Fri Feb 18 23:07:46 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Fri Feb 18 23:12:10 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <4212FB5B.1030209@v.loewis.de> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> <1f7befae05021520263d77a2a3@mail.gmail.com> <4212FB5B.1030209@v.loewis.de> Message-ID: Sorry for taking so long to get back to this thread, it has been one of those weeks for me. On Feb 16, 2005, at 2:50, Martin v. L?wis wrote: > Evan then understood the feature, and made it possible. This is very true: it was a very useful exercise. > I can personally accept breaking the code that still relies on the > invalid APIs. The only problem is that it is really hard to determine > whether some code *does* violate the API usage. Great. Please ignore the patch on SourceForge for a little while. I'll produce a "revision 3" this weekend, without the compatibility hack. Evan Jones From python at rcn.com Fri Feb 18 23:09:19 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 23:13:23 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <421656AC.6010602@egenix.com> Message-ID: <001401c51606$7ec6cda0$803cc797@oemcomputer> > Wouldn't it help a lot more if the compiler would detect that > (1,2,3) is immutable and convert it into a constant at > compile time ?! Yes. We've already gotten it to that point: Python 2.5a0 (#46, Feb 15 2005, 19:11:35) [MSC v.1200 32 bit (Intel)] on win32 >>> import dis >>> dis.dis(compile('x in ("xml", "html", "css")', '', 'eval')) 0 0 LOAD_NAME 0 (x) 3 LOAD_CONST 3 (('xml', 'html', 'css')) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE The question is whether to go a step further to replace the linear search with a single hashed lookup: 0 0 LOAD_NAME 0 (x) 3 LOAD_CONST 3 (searchset(['xml', 'html', 'css'])) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE This situation seems to arise often in source code. You can see the cases in the standard library with: grep 'in ("' *.py The transformation is easy to make at compile time. The part holding me back is the introduction of searchset as a frozenset subtype and teaching marshal how to put it a pyc file. FWIW, some sample timings are included below (using frozenset to approximate what searchset would do). The summary is that the tuple search takes .49usec plus .12usec for each item searched until a match is found. The frozenset lookup takes a constant .53 usec. Raymond ------------------------------------------------------------------------ C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='xml'" "x in s" 1000000 loops, best of 9: 0.49 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='css'" "x in s" 1000000 loops, best of 9: 0.621 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='html'" "x in s" 1000000 loops, best of 9: 0.747 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='pdf'" "x in s" 100000 loops, best of 9: 0.851 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='xml'" "x in s" 1000000 loops, best of 9: 0.529 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='css'" "x in s" 1000000 loops, best of 9: 0.522 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='html'" "x in s" 1000000 loops, best of 9: 0.53 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='pdf'" "x in s" 1000000 loops, best of 9: 0.523 usec per loop From oliphant at ee.byu.edu Fri Feb 18 23:40:54 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 23:40:57 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: <42166EF6.7010600@ee.byu.edu> Guido van Rossum wrote: >>Would it be possible to change >> >>_PyEval_SliceIndex in ceval.c >> >>so that rather than throwing an error if the indexing object is not an >>integer, the code first checks to see if the object has a >>tp_as_number->nb_int method and calls it instead. >> >> > >I don't think this is the right solution; since float has that method, >it would allow floats to be used as slice indices, > > O.K., then how about if arrayobjects can make it in the core, then a check for a rank-0 integer-type arrayobject is allowed before raising an exception? -Travis From tim.peters at gmail.com Fri Feb 18 23:51:37 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 18 23:51:40 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: <1f7befae050217193863ffc028@mail.gmail.com> Message-ID: <1f7befae050218145157bd81c9@mail.gmail.com> [Tim Peters] ... >> Then you allocate a small object, marked 's': >> >> bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff [Evan Jones] > Isn't the whole point of obmalloc No, because it doesn't matter what follows that introduction: obmalloc has several points, including exploiting the GIL, heuristics aiming at reusing memory while it's still high in the memory heirarchy, almost never touching a piece of memory until it's actually needed, and so on. > is that we don't want to allocate "s" on the heap, since it is small? That's one of obmalloc's goals, yes. But "small" is a relative adjective, not absolute. Because we're primarily talking about LFH here, the natural meaning for "small" in _this_ thread is < 16KB, which is much larger than "small" means to obmalloc. The memory-map example applies just well to LFH as to obmalloc, by changing which meaning for "small" you have in mind. > I guess "s" could be an object that might potentially grow. For example, list guts in Python are never handled by obmalloc, although the small fixed-size list _header_ object is always handled by obmalloc. >> One thing to take from that is that LFH can't be helping list-growing >> in a direct way either, if LFH (as seems likely) also needs to copy >> objects that grow in order to keep its internal memory segregated by >> size. The indirect benefit is still available, though: LFH may be >> helping simply by keeping smaller objects out of the general heap's >> hair. > So then wouldn't this mean that there would have to be some sort of > small object being allocated via the system malloc that is causing the > poor behaviour? Yes. For example, a 300-character string could do it (that's not small to obmalloc, but is to LFH). Strings produced by pickling are very often that large, and especially in Zope (which uses pickles extensively under the covers -- reading and writing persistent objects in Zope all involve pickle strings). > As you mention, I wouldn't think it would be list objects, since resizing > lists using LFH should be *worse*. Until they get to LFH's boundary for "small", and we have only the vaguest idea what Martin's app does here -- we know it grows lists containing 50K elements in the end, and ... well, that's all I really know about it . A well-known trick is applicable in that case, if Martin thinks it's worth the bother: grow the list to its final size once, at the start (overestimating if you don't know for sure). Then instead of appending, keep an index to the next free slot, same as you'd do in C. Then the list guts never move, so if that doesn't yield the same kind of speedup without using LFH, list copying wasn't actually the culprit to begin with. > That would actually be something that is worth verifying, however. Not worth the time to me -- Windows is closed-source, and I'm too old to enjoy staring at binary disassemblies any more. Besides, list guts can't stay in LFH after the list exceeds 4K elements. If list-copying costs are significant here, they're far more likely to be due to copying lists over 4K elements than under -- copying a list takes O(len(list)) time. So the realloc() strategy used by LFH _probably_ isn't of _primary)_ interest here. > It could be that the Windows LFH is extra clever? Sure -- that I doubt it moves Heaven & Earth to cater to reallocs is just educated guessing. I wrote my first production heap manager at Cray Research, around 1979 . > ... > Well, it would also be useful to find out what code is calling the > system malloc. This would make it easy to examine the code and see if > it should be calling obmalloc or the system malloc. Any good ideas for > easily obtaining this information? I imagine that some profilers must > be able to produce a complete call graph? Windows supports extensive facilities for analyzing heap usage, even from an external process that attaches to the process you want to analyze. Ditto for profiling. But it's not easy, and I don't know of any free tools that are of real help. If someone were motivated enough, it would probably be easiest to run Martin's app on a Linux box, and use the free Linux tools to analyze it. From david.ascher at gmail.com Sat Feb 19 00:08:24 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 19 00:08:34 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42166EF6.7010600@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> <42166EF6.7010600@ee.byu.edu> Message-ID: On Fri, 18 Feb 2005 15:40:54 -0700, Travis Oliphant wrote: > Guido van Rossum wrote: > > >>Would it be possible to change > >> > >>_PyEval_SliceIndex in ceval.c > >> > >>so that rather than throwing an error if the indexing object is not an > >>integer, the code first checks to see if the object has a > >>tp_as_number->nb_int method and calls it instead. > >> > >> > > > >I don't think this is the right solution; since float has that method, > >it would allow floats to be used as slice indices, > > > > > O.K., > > then how about if arrayobjects can make it in the core, then a check for > a rank-0 integer-type > arrayobject is allowed before raising an exception? Following up on Bob's point, maybe making rank-0 integer type arrayobjects inherit from int has some mileage? Somewhat weird, but... From mal at egenix.com Sat Feb 19 00:42:35 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Feb 19 00:42:42 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <001401c51606$7ec6cda0$803cc797@oemcomputer> References: <001401c51606$7ec6cda0$803cc797@oemcomputer> Message-ID: <42167D6B.9020606@egenix.com> Raymond Hettinger wrote: >>Wouldn't it help a lot more if the compiler would detect that >>(1,2,3) is immutable and convert it into a constant at >>compile time ?! > > > Yes. We've already gotten it to that point: > > Python 2.5a0 (#46, Feb 15 2005, 19:11:35) [MSC v.1200 32 bit (Intel)] on > win32 > >>>>import dis >>>>dis.dis(compile('x in ("xml", "html", "css")', '', 'eval')) > > 0 0 LOAD_NAME 0 (x) > 3 LOAD_CONST 3 (('xml', 'html', 'css')) > 6 COMPARE_OP 6 (in) > 9 RETURN_VALUE Cool. Does that work for all tuples in the program ? > The question is whether to go a step further to replace the linear > search with a single hashed lookup: > > 0 0 LOAD_NAME 0 (x) > 3 LOAD_CONST 3 (searchset(['xml', 'html', > 'css'])) > 6 COMPARE_OP 6 (in) > 9 RETURN_VALUE > > This situation seems to arise often in source code. You can see the > cases in the standard library with: grep 'in ("' *.py I did a search on our code and Python's std lib. It turns out that by far most such usages use either 2 or 3 values in the tuple. If you look at the types of the values, the most common usages are strings and integers. I'd assume that you'll get somewhat different results from your benchmark if you had integers in the tuple. > The transformation is easy to make at compile time. The part holding me > back is the introduction of searchset as a frozenset subtype and > teaching marshal how to put it a pyc file. Hmm, what if you'd teach tuples to do faster contains lookups for string or integer only content, e.g. by introducing sub-types for string-only and integer-only tuples ?! > FWIW, some sample timings are included below (using frozenset to > approximate what searchset would do). The summary is that the tuple > search takes .49usec plus .12usec for each item searched until a match > is found. The frozenset lookup takes a constant .53 usec. > > > > Raymond > > > > ------------------------------------------------------------------------ > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='xml'" > "x in s" > 1000000 loops, best of 9: 0.49 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='css'" > "x in s" > 1000000 loops, best of 9: 0.621 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='html'" > "x in s" > 1000000 loops, best of 9: 0.747 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='pdf'" > "x in s" > 100000 loops, best of 9: 0.851 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='xml'" "x in s" > 1000000 loops, best of 9: 0.529 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='css'" "x in s" > 1000000 loops, best of 9: 0.522 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='html'" "x in s" > 1000000 loops, best of 9: 0.53 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='pdf'" "x in s" > 1000000 loops, best of 9: 0.523 usec per loop -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 19 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From gvanrossum at gmail.com Sat Feb 19 00:49:44 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Feb 19 00:49:47 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> <42166EF6.7010600@ee.byu.edu> Message-ID: [Travis] > > then how about if arrayobjects can make it in the core, then a check for > > a rank-0 integer-type > > arrayobject is allowed before raising an exception? Sure, *if* you can get the premise accepted. [David] > Following up on Bob's point, maybe making rank-0 integer type > arrayobjects inherit from int has some mileage? Somewhat weird, > but... Hm, currently inheriting from int would imply that the C-level memory lay-out of the object is an extension of the built-in int type. That's probably too much of a constraint. But perhaps somehow rank-0-integer-array and int could be the same type? I don't think it would hurt too badly if an int had a method to find out its rank as an array. And I assume you can't iterate over a rank-0 array, right? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ejones at uwaterloo.ca Sat Feb 19 01:10:55 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Sat Feb 19 01:10:51 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <1f7befae050218145157bd81c9@mail.gmail.com> References: <1f7befae050217193863ffc028@mail.gmail.com> <1f7befae050218145157bd81c9@mail.gmail.com> Message-ID: On Feb 18, 2005, at 17:51, Tim Peters wrote: > grow the list to its final size once, at the start (overestimating if > you don't know for sure). Then instead of appending, keep an index to > the next free slot, same as you'd do in C. Then the list guts never > move, so if that doesn't yield the same kind of speedup without using > LFH, list copying wasn't actually the culprit to begin with. If this *does* improve the performance of his application by 15%, that would strongly argue for an addition to the list API similar to Java's ArrayList.ensureCapacity or the STL's vector ::reserve. Since the list implementation already maintains separate ints for the list array size and the list occupied size, this would really just expose this implementation detail to Python. I don't like revealing the implementation in this fashion, but if it does make a significant performance difference, it could be worth it. http://java.sun.com/j2se/1.5.0/docs/api/java/util/ ArrayList.html#ensureCapacity(int) http://www.sgi.com/tech/stl/Vector.html#4 Evan Jones From tim.peters at gmail.com Sat Feb 19 02:43:06 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 02:43:10 2005 Subject: [Python-Dev] Re: Re: Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> Message-ID: <1f7befae050218174345e029e8@mail.gmail.com> [Phillip J. Eby] >> Still, it's rather interesting that tuple.__contains__ appears slower than a >> series of LOAD_CONST and "==" operations, considering that the tuple >> should be doing basically the same thing, only> without bytecode fetch-and- >> decode overhead. Maybe it's tuple.__contains__ that needs optimizing >> here? [Fredrik Lundh] > wouldn't be the first time... How soon we forget . Fredrik introduced a pile of optimizations special-casing the snot out of small integers into ceval.c a long time ago, like this in COMPARE_OP: case COMPARE_OP: w = POP(); v = TOP(); if (PyInt_CheckExact(w) && PyInt_CheckExact(v)) { /* INLINE: cmp(int, int) */ register long a, b; register int res; a = PyInt_AS_LONG(v); b = PyInt_AS_LONG(w); switch (oparg) { case PyCmp_LT: res = a < b; break; case PyCmp_LE: res = a <= b; break; case PyCmp_EQ: res = a == b; break; case PyCmp_NE: res = a != b; break; case PyCmp_GT: res = a > b; break; case PyCmp_GE: res = a >= b; break; case PyCmp_IS: res = v == w; break; case PyCmp_IS_NOT: res = v != w; break; default: goto slow_compare; } x = res ? Py_True : Py_False; Py_INCREF(x); } else { slow_compare: x = cmp_outcome(oparg, v, w); } That's a hell of a lot faster than tuple comparison's deferral to PyObject_RichCompareBool can be, even if we inlined the same blob inside the latter (then we'd still have the additional overhead of calling PyObject_RichCompareBool). As-is, PyObject_RichCompareBool() has to do (relatively) significant work just to out find which concrete comparision implementation to call. As a result, "i == j" in Python source code, when i and j are little ints, is much faster than comparing i and j via any other route in Python. That's mostly really good, IMO -- /F's int optimizations are of major value in real life. Context-dependent optimizations make code performance less predictable too -- that's life. From python at rcn.com Sat Feb 19 02:41:24 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 02:45:29 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <42167D6B.9020606@egenix.com> Message-ID: <002401c51624$1f0ff3a0$803cc797@oemcomputer> > >>Wouldn't it help a lot more if the compiler would detect that > >>(1,2,3) is immutable and convert it into a constant at > >>compile time ?! > > > > > > Yes. We've already gotten it to that point: . . . > > Cool. Does that work for all tuples in the program ? It is limited to just tuples of constants (strings, ints, floats, complex, None, and other tuples). Also, it is limited in its ability to detect a nesting like: a=((1,2),(3,4)). One other limitation is that floats like -0.23 are not recognized as constants because the initial compilation still produces a UNARY_NEGATIVE operation: >>> dis.dis(compile('-0.23', '', 'eval')) 0 0 LOAD_CONST 0 (0.23000000000000001) 3 UNARY_NEGATIVE 4 RETURN_VALUE > I did a search on our code and Python's std lib. It turns > out that by far most such usages use either 2 or 3 > values in the tuple. If you look at the types of the > values, the most common usages are strings and integers. Right, those are the most common cases. The linear searches are ubiquitous. Here's a small selection: if comptype not in ('NONE', 'ULAW', 'ALAW', 'G722') return tail.lower() in (".py", ".pyw") assert n in (2, 3, 4, 5) if value[2] in ('F','n','N') if sectName in ("temp", "cdata", "ignore", "include", "rcdata") if not decode or encoding in ('', '7bit', '8bit', 'binary'): if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") Unfortunately, there are several common patterns that are skipped because rarely changed globals/builtins cannot be treated as constants: if isinstance(x, (int, float, complex)): # types are not constants if op in (ROT_TWO, POP_TOP, LOAD_FAST): # global consts from opcode.py except (TypeError, KeyError, IndexError): # builtins are not constant > I'd assume that you'll get somewhat different results > from your benchmark if you had integers in the tuple. Nope, the results are substantially the same give or take 2usec. > Hmm, what if you'd teach tuples to do faster contains lookups for > string or integer only content, e.g. by introducing sub-types for > string-only and integer-only tuples ?! For a linear search, tuples are already pretty darned good and leave room for only microscopic O(n) improvements. The bigger win comes from using a better algorithm and data structure -- hashing beats linear search hands-down. The constant search time is faster for all n>1, resulting in much improved scalability. No tweaking of tuple.__contains__() can match it. Sets are the right data structure for fast membership testing. I would love for sets to be used internally while letting users continue to write the clean looking code shown above. Raymond From tim.peters at gmail.com Sat Feb 19 03:06:45 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 03:06:48 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <000c01c51586$92c7dd60$3a01a044@oemcomputer> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <1f7befae050218180668dad506@mail.gmail.com> [Raymond Hettinger] > ... > The problem with the transformation was that it didn't handle the case > where x was non-hashable and it would raise a TypeError instead of > returning False as it should. I'm very glad you introduced the optimization of building small constant tuples at compile-time. IMO, that was a pure win. I don't like this one, though. The meaning of "x in (c1, c2, ..., c_n)" is "x == c1 or x == c2 or ... or x == c_n", and a transformation that doesn't behave exactly like the latter in all cases is simply wrong. Even if x isn't hashable, it could still be of a type that implements __eq__, and where x.__eq__(c_i) returned True for some i, and then False is plainly the wrong result. It could also be that x is of a type that is hashable, but where x.__hash__() raises TypeError at this point in the code. That could be for good or bad (bug) reasons, but suppressing the TypeError and converting into False would be a bad thing regardless. > That situation arose once in the email module's test suite. I don't even care if no code in the standard library triggered a problem here: the transformation isn't semantically correct on the face of it. If we knew the type of x at compile-time, then sure, in most (almost all) cases we could know it was a safe transformation (and even without the hack to turn TypeError into False). But we don't know now, so the worst case has to be assumed: can't do this one now. Maybe someday, though. From tim.peters at gmail.com Sat Feb 19 03:24:55 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 03:24:59 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: <1f7befae050217193863ffc028@mail.gmail.com> <1f7befae050218145157bd81c9@mail.gmail.com> Message-ID: <1f7befae050218182444fb7413@mail.gmail.com> [Tim Peters] >> grow the list to its final size once, at the start (overestimating if >> you don't know for sure). Then instead of appending, keep an index to >> the next free slot, same as you'd do in C. Then the list guts never >> move, so if that doesn't yield the same kind of speedup without using >> LFH, list copying wasn't actually the culprit to begin with. [Evan Jones] > If this *does* improve the performance of his application by 15%, that > would strongly argue for an addition to the list API similar to Java's > ArrayList.ensureCapacity or the STL's vector ::reserve. Since the > list implementation already maintains separate ints for the list array > size and the list occupied size, this would really just expose this > implementation detail to Python. I don't like revealing the > implementation in this fashion, but if it does make a significant > performance difference, it could be worth it. That's a happy thought! It was first suggested for Python in 1991 , but before Python 2.4 the list implementation didn't have separate members for current size and capacity, so "can't get there from here" was the only response. It still wouldn't be trivial, because nothing in listobject.c now believes the allocated size ever needs to be preserved, and all len()-changing list operations ensure that "not too much" overallocation remains (see list_resize() in listobject.c for details). But let's see whether it would help first. From ncoghlan at iinet.net.au Sat Feb 19 05:46:32 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sat Feb 19 05:46:38 2005 Subject: [Python-Dev] Proposal for a module to deal with hashing In-Reply-To: <20050217065330.GP25441@zot.electricrain.com> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: <4216C4A8.9060408@iinet.net.au> Gregory P. Smith wrote: > fyi - i've updated the python sha1/md5 openssl patch. it now replaces > the entire sha and md5 modules with a generic hashes module that gives > access to all of the hash algorithms supported by OpenSSL (including > appropriate legacy interface wrappers and falling back to the old code > when compiled without openssl). > > https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 > > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. 'hashtools' and 'hashlib' would both have precedents in the standard library (itertools and urllib, for example). It occurs to me that such a module would provide a way to fix the bug with incorrectly hashable instances of new-style classes: Py> class C: ... def __eq__(self, other): return True ... Py> hash(C()) Traceback (most recent call last): File " ", line 1, in ? TypeError: unhashable instance Py> class C(object): ... def __eq__(self, other): return True ... Py> hash(C()) 10357232 Guido wanted to fix this by eliminating object.__hash__, but that caused problems for Jython. If I remember that discussion correctly, the problem was that, in Jython, the default hash is _not_ simply hash(id(obj)) the way it is in CPython, so Python code needs a way to get access to the default implementation. A hashtools.default_hash that worked like the current object.__hash__ would seem to provide such a spelling, and allow object.__hash__ to be removed (fixing the above bug). Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From python at rcn.com Sat Feb 19 05:47:01 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 05:54:07 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <1f7befae050218180668dad506@mail.gmail.com> Message-ID: <002d01c5163e$3184d720$803cc797@oemcomputer> > I'm very glad you introduced the optimization of building small > constant tuples at compile-time. IMO, that was a pure win. It's been out in the wild for a while now with no issues. I'm somewhat happy with it. > the transformation isn't semantically correct on the > face of it. Well that's the end of that. What we really need is a clean syntax for specifying a constant frozenset without compiler transformations of tuples. That would have the further advantage of letting builtins and globals be used as element values. if isinstance(x, {int, float, complex}): if opcode in {REPEAT, MIN_REPEAT, MAX_REPEAT}: if (code in {301, 302, 303, 307} and m in {"GET", "HEAD"}: if op in (ROT_TWO, POP_TOP, LOAD_FAST) Perhaps something other notation would be better but the idea is basically the same. Raymond From ncoghlan at iinet.net.au Sat Feb 19 06:03:27 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sat Feb 19 06:03:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class Message-ID: <4216C89F.3040400@iinet.net.au> This is something I've typed way too many times: Py> class C(): File " ", line 1 class C(): ^ SyntaxError: invalid syntax It's the asymmetry with functions that gets to me - defining a function with no arguments still requires parentheses in the definition statement, but defining a class with no bases requires the parentheses to be omitted. Which leads in to the real question: Does this *really* need to be a syntax error? Or could it be used as an easier way to spell "class C(object):"? Then, in Python 3K, simply drop support for omitting the parentheses from class definitions - require inheriting from ClassicClass instead. This would also have the benefit that the elimination of defaulting to classic classes would cause a syntax error rather than subtle changes in behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From abo at minkirri.apana.org.au Sat Feb 19 06:18:00 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 19 06:18:10 2005 Subject: [Python-Dev] builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com> <20050218113608.GB25496@vicky.ecs.soton.ac.uk> Message-ID: <024f01c51642$612a6c70$24ed0ccb@apana.org.au> From: "Armin Rigo" > Hi Tim, > > > On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote: > > > 256 ** struct.calcsize('P') > > > > Now if you'll just sign and fax a Zope contributor agreement, I'll > > upgrade ZODB to use this slick trick . > > I hereby donate this line of code to the public domain :-) Damn... we can't use it then! Seriously, on the Python lists there has been a discussion rejecting an md5sum implementation because the author "donated it to the public domain". Apparently lawyers have decided that you can't give code away. Intellectual charity is illegal :-) ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From abo at minkirri.apana.org.au Sat Feb 19 06:38:36 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 19 06:38:48 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> <4215B010.2090600@v.loewis.de> Message-ID: <027b01c51645$42262dc0$24ed0ccb@apana.org.au> From: "Martin v. L?wis" > Donovan Baarda wrote: > > This patch keeps the current md5c.c, md5module.c files and adds the > > following; _hashopenssl.c, hashes.py, md5.py, sha.py. > [...] > > If all we wanted to do was fix the md5 module > > If we want to fix the licensing issues with the md5 module, this patch > does not help at all, as it keeps the current md5 module (along with > its licensing issues). So any patch to solve the problem will need > to delete the code with the questionable license. It maybe half fixes it in that if Python is happy with the RSA one, they can continue to include it, and if Debian is unhappy with it, they can remove it and build against openssl. It doesn't fully fix the license problem. It is still worth considering because it doesn't make it worse, and it does allow Python to use much faster implementations and support other digest algorithms when openssl is available. > Then, the approach in the patch breaks the promise that the md5 module > is always there. It would require that OpenSSL is always there - a > promise that we cannot make (IMO). It would be better if found an alternative md5c.c. I found one that was the libmd implementation that someone mildly tweaked and then slapped an LGPL on. I have a feeling that would make the lawyers tremble more than the "public domain" libmd one, unless they are happy that someone else is prepared to wear the grief for slapping a LGPL onto something public domain. Probably the best at the moment is the sourceforge one, which is listed as having a "zlib/libpng licence". ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From greg at electricrain.com Sat Feb 19 07:46:32 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Feb 19 07:46:35 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <4215B010.2090600@v.loewis.de> References: <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> <4215B010.2090600@v.loewis.de> Message-ID: <20050219064632.GF14279@zot.electricrain.com> On Fri, Feb 18, 2005 at 10:06:24AM +0100, "Martin v. L?wis" wrote: > Donovan Baarda wrote: > >This patch keeps the current md5c.c, md5module.c files and adds the > >following; _hashopenssl.c, hashes.py, md5.py, sha.py. > [...] > >If all we wanted to do was fix the md5 module > > If we want to fix the licensing issues with the md5 module, this patch > does not help at all, as it keeps the current md5 module (along with > its licensing issues). So any patch to solve the problem will need > to delete the code with the questionable license. > > Then, the approach in the patch breaks the promise that the md5 module > is always there. It would require that OpenSSL is always there - a > promise that we cannot make (IMO). I'm aware of that. My goals are primarily to get a good openssl based hashes/digest module going to be used instead of the built in implementations when openssl available because openssl is -so- much faster. Fixing the debian instigated md5 licensing issue is secondary and is something I'll get to later on after i work on the fun stuff. And as Donovan has said, the patch already does present debian with the option of dropping that md5 module and using the openssl derived one instead if they're desperate. based on laziness winning and the issue being so minor i hope they just wait for a patch from me that replaces the md5c.c with one of the acceptably licensed ones for their 2.3/2.4 packages. -g From aleax at aleax.it Sat Feb 19 08:55:44 2005 From: aleax at aleax.it (Alex Martelli) Date: Sat Feb 19 08:55:48 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4216C89F.3040400@iinet.net.au> References: <4216C89F.3040400@iinet.net.au> Message-ID: <03a3f1153caf34d2d087fcc240486a24@aleax.it> On 2005 Feb 19, at 06:03, Nick Coghlan wrote: > This is something I've typed way too many times: > > Py> class C(): > File " ", line 1 > class C(): > ^ > SyntaxError: invalid syntax > > It's the asymmetry with functions that gets to me - defining a > function with no arguments still requires parentheses in the > definition statement, but defining a class with no bases requires the > parentheses to be omitted. Seconded. It's always irked me enough that it's the only ``apology'' for Python syntax you'll see in the Nutshell -- top of p. 71, "The syntax of the class statement has a small, tricky difference from that of the def statement" etc. > Which leads in to the real question: Does this *really* need to be a > syntax error? Or could it be used as an easier way to spell "class > C(object):"? -0 ... instinctively, I dread the task of explaining / teaching about the rationale for this somewhat kludgy transitional solution [[empty parentheses may be written OR omitted, with large difference in meaning, not very related to other cases of such parentheses]], even though I think you're right that it would make the future transition to 3.0 somewhat safer. Alex From python at rcn.com Sat Feb 19 09:01:14 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 09:08:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class References: <4216C89F.3040400@iinet.net.au> <03a3f1153caf34d2d087fcc240486a24@aleax.it> Message-ID: <000101c51659$b2f79e80$afbb9d8d@oemcomputer> > > This is something I've typed way too many times: > > > > Py> class C(): > > File " ", line 1 > > class C(): > > ^ > > SyntaxError: invalid syntax > > > > It's the asymmetry with functions that gets to me - defining a > > function with no arguments still requires parentheses in the > > definition statement, but defining a class with no bases requires the > > parentheses to be omitted. > > Seconded. It's always irked me enough that it's the only ``apology'' > for Python syntax you'll see in the Nutshell -- top of p. 71, "The > syntax of the class statement has a small, tricky difference from that > of the def statement" etc. +1 For me, this would come-up when experimenting with mixins. Adding and removing a mixin usually entailed a corresponding change to the parentheses. Raymond From michael.walter at gmail.com Sat Feb 19 09:12:50 2005 From: michael.walter at gmail.com (Michael Walter) Date: Sat Feb 19 09:12:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <000101c51659$b2f79e80$afbb9d8d@oemcomputer> References: <4216C89F.3040400@iinet.net.au> <03a3f1153caf34d2d087fcc240486a24@aleax.it> <000101c51659$b2f79e80$afbb9d8d@oemcomputer> Message-ID: <877e9a1705021900123c6f0ce2@mail.gmail.com> But... only as an additional option, not as a replacement, right? Michael On Sat, 19 Feb 2005 03:01:14 -0500, Raymond Hettinger wrote: > > > This is something I've typed way too many times: > > > > > > Py> class C(): > > > File " ", line 1 > > > class C(): > > > ^ > > > SyntaxError: invalid syntax > > > > > > It's the asymmetry with functions that gets to me - defining a > > > function with no arguments still requires parentheses in the > > > definition statement, but defining a class with no bases requires the > > > parentheses to be omitted. > > > > Seconded. It's always irked me enough that it's the only ``apology'' > > for Python syntax you'll see in the Nutshell -- top of p. 71, "The > > syntax of the class statement has a small, tricky difference from that > > of the def statement" etc. > > +1 For me, this would come-up when experimenting with mixins. Adding and removing a mixin usually entailed a corresponding > change to the parentheses. > > > Raymond > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com > From fredrik at pythonware.com Sat Feb 19 10:33:59 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 19 10:33:57 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> Message-ID: Tim Peters wrote: > [Fredrik Lundh] >> wouldn't be the first time... > > How soon we forget . oh, that was in the dark ages of Python 1.4. I've rebooted myself many times since then... > Fredrik introduced a pile of optimizations special-casing the snot out > of small integers into ceval.c a long time ago iirc, you claimed that after a couple of major optimizations had been added, "there's no single optimization left that can speed up pystone by more than X%", so I came up with an "(X+2)%" optimization. you should do that more often ;-) > As a result, "i == j" in Python source code, when i and j are little > ints, is much faster than comparing i and j via any other route in > Python. which explains why my "in" vs. "or" tests showed good results for integers, but not for strings... I'd say that this explains why it would still make sense to let the code generator change "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. (see my earlier timeit results) From fredrik at pythonware.com Sat Feb 19 10:40:16 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 19 10:40:11 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com><20050218113608.GB25496@vicky.ecs.soton.ac.uk> <024f01c51642$612a6c70$24ed0ccb@apana.org.au> Message-ID: Donovan Baarda wrote: > Apparently lawyers have decided that you can't give code away. Intellectual > charity is illegal :-) what else would a lawyer say? do you really expect lawyers to admit that there are ways to do things that don't involve lawyers? From martin at v.loewis.de Sat Feb 19 11:47:13 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 19 11:47:15 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> Message-ID: <42171931.4020600@v.loewis.de> Fredrik Lundh wrote: > I'd say that this explains why it would still make sense to let the code generator change > "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. How often does that happen in real code? Regards, Martin From martin at v.loewis.de Sat Feb 19 11:54:06 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 19 11:54:09 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <024f01c51642$612a6c70$24ed0ccb@apana.org.au> References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com> <20050218113608.GB25496@vicky.ecs.soton.ac.uk> <024f01c51642$612a6c70$24ed0ccb@apana.org.au> Message-ID: <42171ACE.9020502@v.loewis.de> Donovan Baarda wrote: > Seriously, on the Python lists there has been a discussion rejecting an > md5sum implementation because the author "donated it to the public domain". > Apparently lawyers have decided that you can't give code away. Intellectual > charity is illegal :-) Despite the smiley: It is not illegal - it just does not have any legal effect. Just by saying "I am the chancellor of Germany", it does not make you the chancellor of Germany; instead, you need to go through the election processes. Likewise, saying "the public can have my code" does not make it so. Instead, you have to formulate a license that permits the public to do with the code what you think it should be allowed to do. Most people who've used the term "public domain" in the past didn't really care whether they still have the copyright - what they wanted to say is that anybody can use their work for any purpose. Regards, Martin From mal at egenix.com Sat Feb 19 13:06:37 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Feb 19 13:06:40 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <002401c51624$1f0ff3a0$803cc797@oemcomputer> References: <002401c51624$1f0ff3a0$803cc797@oemcomputer> Message-ID: <42172BCD.2010807@egenix.com> Raymond Hettinger wrote: >>Hmm, what if you'd teach tuples to do faster contains lookups for >>string or integer only content, e.g. by introducing sub-types for >>string-only and integer-only tuples ?! > > > For a linear search, tuples are already pretty darned good and leave > room for only microscopic O(n) improvements. The bigger win comes from > using a better algorithm and data structure -- hashing beats linear > search hands-down. The constant search time is faster for all n>1, > resulting in much improved scalability. No tweaking of > tuple.__contains__() can match it. > > Sets are the right data structure for fast membership testing. I would > love for sets to be used internally while letting users continue to > write the clean looking code shown above. That's what I was thinking off: if the compiler can detect the constant nature and the use of a common type, it could set a flag in the tuple type telling it about this feature. The tuple could then convert the tuple contents to a set internally and when the __contains__ hook is first called and use the set for the lookup. Alternatively, you could use a sub-type for a few common cases. In either case you would have to teach marshal how to treat the extra bit of information. The user won't notice all this in the Python program and can continue to write clean code (in some cases, even cleaner code than before - I usually use the keyword hack to force certain things into the locals at module load time, but would love to get rid off this). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 19 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From aahz at pythoncraft.com Sat Feb 19 16:11:46 2005 From: aahz at pythoncraft.com (Aahz) Date: Sat Feb 19 16:11:48 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <42171931.4020600@v.loewis.de> References: <1f7befae050218174345e029e8@mail.gmail.com> <42171931.4020600@v.loewis.de> Message-ID: <20050219151146.GA4837@panix.com> On Sat, Feb 19, 2005, "Martin v. L?wis" wrote: > Fredrik Lundh wrote: >> >>I'd say that this explains why it would still make sense to let the code >>generator change >>"x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c >>are all integers. > > How often does that happen in real code? Dunno how often, but I was working on some code at my company yesterday that did that -- we use a lot of ints to indicate options. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From mwh at python.net Sat Feb 19 21:27:13 2005 From: mwh at python.net (Michael Hudson) Date: Sat Feb 19 21:27:16 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4216C89F.3040400@iinet.net.au> (Nick Coghlan's message of "Sat, 19 Feb 2005 15:03:27 +1000") References: <4216C89F.3040400@iinet.net.au> Message-ID: <2mpsywxplq.fsf@starship.python.net> Nick Coghlan writes: > This is something I've typed way too many times: > > Py> class C(): > File " ", line 1 > class C(): > ^ > SyntaxError: invalid syntax > > It's the asymmetry with functions that gets to me - defining a > function with no arguments still requires parentheses in the > definition statement, but defining a class with no bases requires the > parentheses to be omitted. Yeah, this has annoyed me for ages too. However! You obviously haven't read Misc/HISTORY recently enough :) The surprising thing is that "class C():" used to work (in fact before 0.9.4 the parens mandatory). It became a syntax error in 0.9.9, seemingly because Guido was peeved that people hadn't updated all their old code to the new syntax. I wonder if he'd like to try that trick again today :) I'd still vote for it to be changed. > Which leads in to the real question: Does this *really* need to be a > syntax error? Or could it be used as an easier way to spell "class > C(object):"? -1. Too magical, too opaque. > Then, in Python 3K, simply drop support for omitting the parentheses > from class definitions - require inheriting from ClassicClass > instead. HISTORY repeats itself... Cheers, mwh -- [Perl] combines all the worst aspects of C and Lisp: a billion different sublanguages in one monolithic executable. It combines the power of C with the readability of PostScript. -- Jamie Zawinski From reinhold-birkenfeld-nospam at wolke7.net Sun Feb 20 00:26:36 2005 From: reinhold-birkenfeld-nospam at wolke7.net (Reinhold Birkenfeld) Date: Sun Feb 20 00:26:09 2005 Subject: [Python-Dev] Some old patches Message-ID: Hello, this time working up some of the patches with beards: - #751943 Adds the display of the line number to cgitb stack traces even when the source code is not available to cgitb. This makes sense in the case that the source is lying around somewhere else. However, the original patch generates a link to "file://?" on the occasion that the source file name is not known. I have created a new patch (#1144549) that fixes this, and also renames all local variables "file" in cgitb to avoid builtin shadowing. - #749830 Allows the mmap call on UNIX to be supplied a length argument of 0 to mmap the whole file (which is already implemented on Windows). However, the patch doesn't apply on current CVS, so I made a new patch (#1144555) that does. Recommend apply, unless this may cause problems on some Unices which I don't know about. - #547176 Allows the rlcompleter to complete on [] item access (constructs like sim[0]. could then be completed). As comments in the patch point out, this easily leads to execution of arbitrary code via __getitem__, which is IMHO a too big side effect of completing (though IPython does this). Recommend reject. - #645894 Allows the use of resource.getrusage time values for profile.py, which results in better timing resolution on FreeBSD. However, this may lead to worse timing resolution on other OS, so perhaps the patch should be changed to be restricted to this particular platform. - #697613 -- bug #670311 This handles the problem that python -i exits on SystemExit exceptions by introducting two new API functions. While it works for me, I am not sure whether this is too much overhead for fixing a glitch no one else complained about. - #802188 This adds a specific error message for invalid tokens after a '\' used as line continuation. While it may be helpful when the invalid token is whitespace, Python usually shows the exact location of the invalid token, so you can examine this line and find the error. On the other hand, the patch is no big deal, so if a specific error message is welcome, it may as well be applied. Enough for today... and best of all: I have no patch which I want to promote! Reinhold From gvanrossum at gmail.com Sun Feb 20 02:08:09 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 02:08:15 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <2mpsywxplq.fsf@starship.python.net> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> Message-ID: > > This is something I've typed way too many times: > > > > Py> class C(): > > File " ", line 1 > > class C(): > > ^ > > SyntaxError: invalid syntax > > > > It's the asymmetry with functions that gets to me - defining a > > function with no arguments still requires parentheses in the > > definition statement, but defining a class with no bases requires the > > parentheses to be omitted. It's fine to fix this in 2.5. I guess I can add this to my list of early oopsies -- although to the very bottom. :-) It's *not* fine to make C() mean C(object). (We already have enough other ways to declaring new-style classes.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at iinet.net.au Sun Feb 20 03:13:25 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sun Feb 20 03:13:31 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> Message-ID: <4217F245.2020004@iinet.net.au> Guido van Rossum wrote: >>>This is something I've typed way too many times: >>> >>>Py> class C(): >>> File " ", line 1 >>> class C(): >>> ^ >>>SyntaxError: invalid syntax >>> >>>It's the asymmetry with functions that gets to me - defining a >>>function with no arguments still requires parentheses in the >>>definition statement, but defining a class with no bases requires the >>>parentheses to be omitted. > > > It's fine to fix this in 2.5. I guess I can add this to my list of > early oopsies -- although to the very bottom. :-) > > It's *not* fine to make C() mean C(object). (We already have enough > other ways to declaring new-style classes.) > Fair enough - the magnitude of the semantic difference between "class C:" and "class C():" bothered me a little, too. I'll just have to remember that I can put "__metaclass__ == type" at the top of modules :) Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From jack at performancedrivers.com Sun Feb 20 04:35:38 2005 From: jack at performancedrivers.com (Jack Diederich) Date: Sun Feb 20 04:35:42 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4217F245.2020004@iinet.net.au> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> Message-ID: <20050220033538.GF9263@performancedrivers.com> On Sun, Feb 20, 2005 at 12:13:25PM +1000, Nick Coghlan wrote: > Guido van Rossum wrote: > >>>This is something I've typed way too many times: > >>> > >>>Py> class C(): > >>> File " ", line 1 > >>> class C(): > >>> ^ > >>>SyntaxError: invalid syntax > >>> > >>>It's the asymmetry with functions that gets to me - defining a > >>>function with no arguments still requires parentheses in the > >>>definition statement, but defining a class with no bases requires the > >>>parentheses to be omitted. > > > > > >It's fine to fix this in 2.5. I guess I can add this to my list of > >early oopsies -- although to the very bottom. :-) > > > >It's *not* fine to make C() mean C(object). (We already have enough > >other ways to declaring new-style classes.) > > > > Fair enough - the magnitude of the semantic difference between "class C:" > and "class C():" bothered me a little, too. I'll just have to remember that > I can put "__metaclass__ == type" at the top of modules :) I always use new style classes so I only have to remember one set of behaviors. "__metaclass__ = type" is warty, it has the "action at a distance" problem that decorators solve for functions. I didn't dig into the C but does having 'type' as metaclass guarantee the same behavior as inheriting 'object' or does object provide something type doesn't? *wince* Py3k? Faster please[*]. -Jack * a US-ism of a conservative bent, loosely translated as "change for the better? I'll get behind that." From python at rcn.com Sun Feb 20 04:46:40 2005 From: python at rcn.com (Raymond Hettinger) Date: Sun Feb 20 04:51:42 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: Message-ID: <001301c516fe$ed674700$f33ec797@oemcomputer> > > > This is something I've typed way too many times: > > > > > > Py> class C(): > > > File " ", line 1 > > > class C(): > > > ^ > > > SyntaxError: invalid syntax > > > > > > It's the asymmetry with functions that gets to me - defining a > > > function with no arguments still requires parentheses in the > > > definition statement, but defining a class with no bases requires the > > > parentheses to be omitted. > > It's fine to fix this in 2.5. Yea! Raymond From raymond.hettinger at verizon.net Sun Feb 20 05:20:25 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Feb 20 05:24:29 2005 Subject: [Python-Dev] UserString Message-ID: <000001c51703$80f97520$f33ec797@oemcomputer> I noticed that UserString objects have methods that do not accept other UserString objects as arguments: >>> from UserString import UserString >>> UserString('slartibartfast').count(UserString('a')) Traceback (most recent call last): File " ", line 1, in -toplevel- UserString('slartibartfast').count(UserString('a')) File "C:\PY24\lib\UserString.py", line 66, in count return self.data.count(sub, start, end) TypeError: expected a character buffer object >>> UserString('abc') in UserString('abcde') Traceback (most recent call last): File " ", line 1, in -toplevel- UserString('abc') in UserString('abcde') File "C:\PY24\lib\UserString.py", line 35, in __contains__ return char in self.data TypeError: 'in ' requires string as left operand This sort of thing is easy to test for and easy to fix. The question is whether we care about updating this module anymore or is it a relic. Also, is the use case one that we care about. AFAICT, this has never come up before. Raymond From gvanrossum at gmail.com Sun Feb 20 06:33:31 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 06:33:38 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <20050220033538.GF9263@performancedrivers.com> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: > I didn't dig into the C but does having 'type' > as metaclass guarantee the same behavior as inheriting 'object' or does object > provide something type doesn't? *wince* No, they're equivalent. __metaclass__ = type cause the base class to be object, and a base class of object causes the metaclass to be type. But I agree wholeheartedly: class C(object): is much preferred. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Sun Feb 20 09:15:25 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 09:15:29 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <20050220033538.GF9263@performancedrivers.com> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> On 2005 Feb 20, at 04:35, Jack Diederich wrote: > I always use new style classes so I only have to remember one set of > behaviors. I agree: that's reason #1 I recommend always using new-style whenever I teach / tutor / mentor in Python nowadays. > "__metaclass__ = type" is warty, it has the "action at a distance" > problem that > decorators solve for functions. I disagree. I view it as akin to a "from __future__ import" except that -- since the compiler doesn't need-to-know, as typeclass-picking happens at runtime -- it was accomplished by less magical and more flexible means. > I didn't dig into the C but does having 'type' > as metaclass guarantee the same behavior as inheriting 'object' or > does object > provide something type doesn't? *wince* I believe the former holds, since for example: >>> class X: __metaclass__ = type ... >>> X.__bases__ ( ,) If you're making a newstyle class with an oldstyle base, it's different: >>> class Y: pass ... >>> class X(Y): __metaclass__ = type ... Traceback (most recent call last): File " ", line 1, in ? TypeError: Error when calling the metaclass bases a new-style class can't have only classic bases in this case, you do need to inherit object explicitly: >>> class X(Y, object): pass ... >>> X.__bases__ ( , ) >>> type(X) This is because types.ClassType turns somersaults to enable this: in this latter construct, Python's mechanisms determine ClassType as the metaclass (it's the metaclass of the first base class), but then ClassType in turn sniffs around for another metaclass to delegate to, among the supplied bases, and having found one washes its hands of the whole business;-). Alex From aleax at aleax.it Sun Feb 20 09:32:35 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 09:32:43 2005 Subject: [Python-Dev] UserString In-Reply-To: <000001c51703$80f97520$f33ec797@oemcomputer> References: <000001c51703$80f97520$f33ec797@oemcomputer> Message-ID: <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> On 2005 Feb 20, at 05:20, Raymond Hettinger wrote: ... > This sort of thing is easy to test for and easy to fix. The question > is > whether we care about updating this module anymore or is it a relic. > Also, is the use case one that we care about. AFAICT, this has never > come up before. I did have some issues w/UserString at a client's, but that was connected to some code doing type-checking (and was fixed by injecting basestring as a base of the client's subclass of UserString and ensuring the type-checking always used isinstance and basestring). My two cents: a *mixin* to make it easy to emulate full-fledged strings would be almost as precious as your DictMixin (ones to emulate lists, sets, files [w/buffering], ..., might be even more useful). The point is all of these rich interfaces have a lot of redundancy and a mixin can provide all methods generically based on a few fundamental methods, which can be quite useful, just like DictMixin. But a complete emulation of strings (etc) is mostly of "didactical" use, a sort of checklist to help ensure one implements all methods, not really useful for new code "in production"; at least, I haven't found such uses recently. The above-mentioned client's class was an attempt to join RE functionality to strings and was a rather messy hack anyway, for example (perhaps prompted by client's previous familiarity with Perl, I'm not sure); at any rate, the client should probably have subclassed str or unicode if he really wanted that hack. I can't think of a GOOD use for UserString (etc) since subclassing str (etc) was allowed in 2.2 or at least since a few loose ends about newstyle classes were neatly tied up in 2.3. If we do decide "it is a relic, no more updates" perhaps some indication of deprecation would be warranted. ((In any case, I do think the mixins would be useful)). Alex From mwh at python.net Sun Feb 20 10:38:29 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 10:38:31 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> (Alex Martelli's message of "Sun, 20 Feb 2005 09:15:25 +0100") References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> <243fad4f779b2c979e1aa71fd866cda1@aleax.it> Message-ID: <2mvf8nwoyy.fsf@starship.python.net> Alex Martelli writes: > On 2005 Feb 20, at 04:35, Jack Diederich wrote: > >> I didn't dig into the C but does having 'type' >> as metaclass guarantee the same behavior as inheriting 'object' or >> does object >> provide something type doesn't? *wince* > > I believe the former holds, since for example: I was going to say that 'type(object) is type' is everything you need to know, but you also need the bit of code in type_new that replaces an empty bases tuple with (object,) -- but class C: __metaclass__ = Type and class C(object): pass produce identical classes. > This is because types.ClassType turns somersaults to enable this: in > this latter construct, Python's mechanisms determine ClassType as the > metaclass (it's the metaclass of the first base class), but then > ClassType in turn sniffs around for another metaclass to delegate to, > among the supplied bases, and having found one washes its hands of the > whole business;-). It's also notable that type_new does exactly the same thing! Cheers, mwh -- Jokes around here tend to get followed by implementations. -- from Twisted.Quotes From fredrik at pythonware.com Sun Feb 20 13:07:17 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Feb 20 13:07:21 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> <42171931.4020600@v.loewis.de> Message-ID: Martin v. Löwis wrote: >> I'd say that this explains why it would still make sense to let the code generator change >> "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. > > How often does that happen in real code? don't know, but it happens: [fredrik@brain Python-2.4]$ grep "if.*in *([0-9]" Lib/*.py Lib/BaseHTTPServer.py: if self.command != 'HEAD' and code >= 200 and code not in (204, 304): Lib/asyncore.py: if err in (0, EISCONN): Lib/mimify.py: if len(args) not in (0, 1, 2): Lib/sunau.py: if nchannels not in (1, 2, 4): Lib/sunau.py: if sampwidth not in (1, 2, 4): Lib/urllib2.py: if code not in (200, 206): Lib/urllib2.py: if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") Lib/whichdb.py: if magic in (0x00061561, 0x61150600): Lib/whichdb.py: if magic in (0x00061561, 0x61150600): [fredrik@brain Python-2.4]$ grep "if.*in *\[[0-9]" Lib/*.py Lib/decimal.py: if value[0] not in [0,1]: Lib/smtplib.py: if code not in [235, 503]: judging from the standard library, "string in string tuple/list" is a lot more common. From raymond.hettinger at verizon.net Sun Feb 20 16:39:24 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Feb 20 16:43:33 2005 Subject: [Python-Dev] Store x Load x --> DupStore Message-ID: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Any objections to new peephole transformation that merges a store/load pair into a single step? There is a tested patch at: www.python.org/sf/1144842 It folds the two steps into a new opcode. In the case of store_name/load_name, it saves one three byte instruction, a trip around the eval-loop, two stack mutations, a incref/decref pair, a dictionary lookup, and an error check (for the lookup). While it acts like a dup followed by a store, it is implemented more simply as a store that doesn't pop the stack. The transformation is broadly applicable and occurs thousands of times in the standard library and test suite. Raymond Hettinger From gvanrossum at gmail.com Sun Feb 20 17:06:28 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 17:06:33 2005 Subject: [Python-Dev] UserString In-Reply-To: <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: [Alex] > I did have some issues w/UserString at a client's, but that was > connected to some code doing type-checking (and was fixed by injecting > basestring as a base of the client's subclass of UserString and > ensuring the type-checking always used isinstance and basestring). Oh, bah. That's not what basestring was for. I can't blame you or your client, but my *intention* was that basestring would *only* be the base of the two *real* built-in string types (str and unicode). The reason for its existence was that some low-level built-in (or extension) operations only accept those two *real* string types and consequently some user code might want to validate ("look before you leap") its own arguments if those eventually ended up being passed to aforementioned low-level built-in code. My intention was always that UserString and other string-like objects would explicitly *not* inherit from basestring. Of course, my intention was lost, your client used basestring to mean "any string-ish object", got away with it because they weren't using any of those low-level built-ins, and you had to comply rather than explain it to them. Sounds like a good reason to add interfaces to the language. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gvanrossum at gmail.com Sun Feb 20 17:17:15 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 17:17:17 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <000101c51762$5b8369e0$7c1cc797@oemcomputer> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: > Any objections to new peephole transformation that merges a store/load > pair into a single step? > > There is a tested patch at: www.python.org/sf/1144842 > > It folds the two steps into a new opcode. In the case of > store_name/load_name, it saves one three byte instruction, a trip around > the eval-loop, two stack mutations, a incref/decref pair, a dictionary > lookup, and an error check (for the lookup). While it acts like a dup > followed by a store, it is implemented more simply as a store that > doesn't pop the stack. The transformation is broadly applicable and > occurs thousands of times in the standard library and test suite. What exactly are you trying to accomplish? Do you have examples of code that would be sped up measurably by this transformation? Does anybody care about those speedups even if they *are* measurable? I'm concerned that there's too much hacking of the VM going on with too little benefit. The VM used to be relatively simple code that many people could easily understand. The benefit of that was that new language features could be implemented relatively easily even by relatively inexperienced developers. All that seems to be lost, and I fear that the end result is going to be a calcified VM that's only 10% faster than the original, since we appear to have reached the land of diminishing returns here. I don't see any concentrated efforts trying to figure out where the biggest pain is and how to relieve it; rather, it looks as if the easiest targets are being approached. Now, if these were low-hanging fruit, I'd happily agree, but I'm not so sure that they are all that valuable. Where are the attempts to speed up function/method calls? That's an area where we could *really* use a breakthrough... Eventually we'll need a radically different approach, maybe PyPy, maybe Starkiller. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Sun Feb 20 17:41:31 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 17:41:36 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: On 2005 Feb 20, at 17:06, Guido van Rossum wrote: > [Alex] >> I did have some issues w/UserString at a client's, but that was >> connected to some code doing type-checking (and was fixed by injecting >> basestring as a base of the client's subclass of UserString and >> ensuring the type-checking always used isinstance and basestring). > > Oh, bah. That's not what basestring was for. I can't blame you or your > client, but my *intention* was that basestring would *only* be the > base of the two *real* built-in string types (str and unicode). The > reason for its existence was that some low-level built-in (or > extension) operations only accept those two *real* string types and > consequently some user code might want to validate ("look before you > leap") its own arguments if those eventually ended up being passed to > aforementioned low-level built-in code. My intention was always that > UserString and other string-like objects would explicitly *not* > inherit from basestring. Of course, my intention was lost, your client > used basestring to mean "any string-ish object", got away with it > because they weren't using any of those low-level built-ins, and you > had to comply rather than explain it to them. I would gladly have explained, if I had understood your design intent correctly at the time (whether the explanation would have done much good is another issue); but I'm afraid I didn't. Now I do (thanks for explaining!) though I'm not sure what can be done in retrospect to communicate it more widely. The need to check "is this thingy here string-like" is sort of frequent, because strings are sequences which, when iterated on, yield sequences (strings of length 1) which, when iterated on, yield sequences ad infinitum. Strings are sequences but more often than not one wants to treat them as "scalars" instead. isinstance and basestring allow that frequently needed check so nicely, that, if they're not intended for it, they're an "attractive nuisance" legally;-). The need to make stringlike thingies emerges both for bad reasons (e.g., I never liked that client's "string cum re" perloidism) and good ones (e.g., easing the interfacing with external frameworks that have their own stringythings, such as Qt's QtString); and checking if something is stringlike is also frequent, as per previous para. Darn... > Sounds like a good reason to add interfaces to the language. :-) If an interface must be usable to say "is this string-like?" it will have to be untyped, I guess, and the .translate method will be a small problem (one-argument for unicode, two-args for str, and very different argument semantics) -- don't recall offhand if there are other such nonpolymorphic methods there. Alex From pje at telecommunity.com Sun Feb 20 18:37:41 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 18:34:59 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >Where are the attempts to speed up function/method calls? That's an >area where we could *really* use a breakthrough... Amen! So what happened to Armin's pre-allocated frame patch? Did that get into 2.4? Also, does anybody know where all the time goes in a function call, anyway? I assume that some of the pieces are: * tuple/dict allocation for arguments (but some of this is bypassed on the fast branch for Python-to-Python calls, right?) * frame allocation and setup (but Armin's patch was supposed to eliminate most of this whenever a function isn't being used re-entrantly) * argument "parsing" (check number of args, map kwargs to their positions, etc.; but isn't some of this already fast-pathed for Python-to-Python calls?) I suppose the fast branch fixes don't help special methods like __getitem__ et al, since those don't go through the fast branch, but I don't think those are the majority of function calls. And whatever happened to CALL_METHOD? Do we need a tp_callmethod that takes an argument array, length, and keywords, so that we can skip instancemethod allocation in the common case of calling a method directly? From pje at telecommunity.com Sun Feb 20 18:15:44 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 18:35:09 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> References: <20050220033538.GF9263@performancedrivers.com> <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: <5.1.1.6.0.20050220121233.021107a0@mail.telecommunity.com> At 09:15 AM 2/20/05 +0100, Alex Martelli wrote: >This is because types.ClassType turns somersaults to enable this: in this >latter construct, Python's mechanisms determine ClassType as the metaclass >(it's the metaclass of the first base class), but then ClassType in turn >sniffs around for another metaclass to delegate to, among the supplied >bases, and having found one washes its hands of the whole business;-). To be pedantic, the actual algorithm in 2.2+ has nothing to do with the first base class; that's the pre-2.2 algorithm. The 2.2 algorithm looks for the most-derived metaclass of the base classes, and simply ignores classic bases altogether. From martin at v.loewis.de Sun Feb 20 18:41:19 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Feb 20 18:41:22 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <4218CBBF.8030400@v.loewis.de> Guido van Rossum wrote: > I'm concerned that there's too much hacking of the VM going on with > too little benefit. I completely agree. It would be so much more useful if people tried to fix the bugs that have been reported. Regards, Martin From mwh at python.net Sun Feb 20 19:38:39 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 19:38:40 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: (Guido van Rossum's message of "Sun, 20 Feb 2005 08:17:15 -0800") References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <2mr7jbvzyo.fsf@starship.python.net> Guido van Rossum writes: >> Any objections to new peephole transformation that merges a store/load >> pair into a single step? >> >> There is a tested patch at: www.python.org/sf/1144842 >> >> It folds the two steps into a new opcode. In the case of >> store_name/load_name, it saves one three byte instruction, a trip around >> the eval-loop, two stack mutations, a incref/decref pair, a dictionary >> lookup, and an error check (for the lookup). While it acts like a dup >> followed by a store, it is implemented more simply as a store that >> doesn't pop the stack. The transformation is broadly applicable and >> occurs thousands of times in the standard library and test suite. I'm still a little curious as to what code creates such opcodes... > What exactly are you trying to accomplish? Do you have examples of > code that would be sped up measurably by this transformation? Does > anybody care about those speedups even if they *are* measurable? > > I'm concerned that there's too much hacking of the VM going on with > too little benefit. The VM used to be relatively simple code that many > people could easily understand. The benefit of that was that new > language features could be implemented relatively easily even by > relatively inexperienced developers. All that seems to be lost, and I > fear that the end result is going to be a calcified VM that's only 10% > faster than the original, since we appear to have reached the land of > diminishing returns here. In the case of the bytecode optimizer, I'm not sure this is a fair accusation. Even if you don't understand it, you can ignore it and not have your understanding of the rest of the VM affected (I'm not sure that compile.c has ever been "easily understood" in any case :). > I don't see any concentrated efforts trying to figure out where the > biggest pain is and how to relieve it; rather, it looks as if the > easiest targets are being approached. Now, if these were low-hanging > fruit, I'd happily agree, but I'm not so sure that they are all that > valuable. I think some of the peepholer's work are pure wins -- x,y = y,x unpacking and the creation of constant tuples certainly spring to mind. If Raymond wants to spend his time on this stuff, that's his choice. I don't think the obfuscation cost is all that high. > Where are the attempts to speed up function/method calls? That's an > area where we could *really* use a breakthrough... The problem is that it's hard! > Eventually we'll need a radically different approach, maybe PyPy, > maybe Starkiller. Yup. Cheers, mwh -- Gevalia is undrinkable low-octane see-through only slightly roasted bilge water. Compared to .us coffee it is quite drinkable. -- M?ns Nilsson, asr From mwh at python.net Sun Feb 20 20:00:13 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 20:00:30 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 20 Feb 2005 12:37:41 -0500") References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <2mmztzvyyq.fsf@starship.python.net> "Phillip J. Eby" writes: > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >>Where are the attempts to speed up function/method calls? That's an >>area where we could *really* use a breakthrough... > > Amen! > > So what happened to Armin's pre-allocated frame patch? Did that get into 2.4? No, because it slows down recursive function calls, or functions that happen to be called at the same time in different threads. Fixing *that* would require things like code specific frame free-lists and that's getting a bit convoluted and might waste quite a lot of memory. Eliminating the blockstack would be nice (esp. if it's enough to get frames small enough that they get allocated by PyMalloc) but this seemed to be tricky too (or at least Armin, Samuele and I spent a cuple of hours yakking about it on IRC and didn't come up with a clear approach). Dynamically allocating the blockstack would be simpler, and might acheive a similar win. (This is all from memory, I haven't thought about specifics in a while). > Also, does anybody know where all the time goes in a function call, > anyway? I did once... > I assume that some of the pieces are: > > * tuple/dict allocation for arguments (but some of this is bypassed on > the fast branch for Python-to-Python calls, right?) All of it, in easy cases. ISTR that the fast path could be a little wider -- it bails when the called function has default arguments, but I think this case could be handled easily enough. > * frame allocation and setup (but Armin's patch was supposed to > eliminate most of this whenever a function isn't being used > re-entrantly) Ah, you remember the wart :) I think even with the patch, frame setup is a significant amount of work. Why are frames so big? > * argument "parsing" (check number of args, map kwargs to their > positions, etc.; but isn't some of this already fast-pathed for > Python-to-Python calls?) Yes. With some effort you could probably avoid a copy (and incref) of the arguments from the callers to the callees stack area. BFD. > I suppose the fast branch fixes don't help special methods like > __getitem__ et al, since those don't go through the fast branch, but I > don't think those are the majority of function calls. Indeed. I suspect this fails the effort/benefit test, but I could be wrong. > And whatever happened to CALL_METHOD? It didn't work as an optimization, as far as I remember. I think the patch is on SF somewhere. Or is a branch in CVS? Oh, it's patch #709744. > Do we need a tp_callmethod that takes an argument array, length, and > keywords, so that we can skip instancemethod allocation in the > common case of calling a method directly? Hmm, didn't think of that, and I don't think it's how the CALL_ATTR attempt worked. I presume it would need to take a method name too :) I already have a patch that does this for regular function calls (it's a rearrangement/refactoring not an optimization though). Cheers, mwh -- I think perhaps we should have electoral collages and construct our representatives entirely of little bits of cloth and papier mache. -- Owen Dunn, ucam.chat, from his review of the year From bac at OCF.Berkeley.EDU Sun Feb 20 20:41:03 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Feb 20 20:41:12 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mmztzvyyq.fsf@starship.python.net> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <2mmztzvyyq.fsf@starship.python.net> Message-ID: <4218E7CF.1020208@ocf.berkeley.edu> Michael Hudson wrote: > "Phillip J. Eby" writes: [SNIP] >>And whatever happened to CALL_METHOD? > > > It didn't work as an optimization, as far as I remember. I think the > patch is on SF somewhere. Or is a branch in CVS? Oh, it's patch > #709744. > > >>Do we need a tp_callmethod that takes an argument array, length, and >>keywords, so that we can skip instancemethod allocation in the >>common case of calling a method directly? > > > Hmm, didn't think of that, and I don't think it's how the CALL_ATTR > attempt worked. I presume it would need to take a method name too :) > CALL_ATTR basically replaced ``LOAD_ATTR; CALL_FUNCTION`` with a single opcode. Idea was that the function creation by the LOAD_ATTR was a wasted step so might as well just skip it and call the method directly. Problem was the work required to support both classic and new-style classes. Now I have not looked at the code since it was written back at PyCon 2003 and I was a total newbie to the core's C code at that point and I think Thomas said it had been two years since he did any major core hacking. In other words it could possibly have been done better. =) -Brett From pje at telecommunity.com Sun Feb 20 21:22:00 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 21:19:19 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mr7jbvzyo.fsf@starship.python.net> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <5.1.1.6.0.20050220150416.029b3960@mail.telecommunity.com> At 06:38 PM 2/20/05 +0000, Michael Hudson wrote: > >> It folds the two steps into a new opcode. In the case of > >> store_name/load_name, it saves one three byte instruction, a trip around > >> the eval-loop, two stack mutations, a incref/decref pair, a dictionary > >> lookup, and an error check (for the lookup). While it acts like a dup > >> followed by a store, it is implemented more simply as a store that > >> doesn't pop the stack. The transformation is broadly applicable and > >> occurs thousands of times in the standard library and test suite. > >I'm still a little curious as to what code creates such opcodes... A simple STORE+LOAD case: >>> dis.dis(compile("x=1; y=x*2","?","exec")) 1 0 LOAD_CONST 0 (1) 3 STORE_NAME 0 (x) 6 LOAD_NAME 0 (x) 9 LOAD_CONST 1 (2) 12 BINARY_MULTIPLY 13 STORE_NAME 1 (y) 16 LOAD_CONST 2 (None) 19 RETURN_VALUE And a simple DUP+STORE case: >>> dis.dis(compile("x=y=1","?","exec")) 1 0 LOAD_CONST 0 (1) 3 DUP_TOP 4 STORE_NAME 0 (x) 7 STORE_NAME 1 (y) 10 LOAD_CONST 1 (None) 13 RETURN_VALUE Of course, I'm not sure how commonly this sort of code occurs in places where it makes a difference to anything. Function call overhead continues to be Python's most damaging performance issue, because it makes it expensive to use abstraction. Here's a thought. Suppose we split frames into an "object" part and a "struct" part, with the object part being just a pointer to the struct part, and a flag indicating whether the struct part is stack-allocated or malloc'ed. This would let us stack-allocate the bulk of the frame structure, but still have a frame "object" to pass around. On exit from the C routine that stack-allocated the frame struct, we check to see if the frame object has a refcount>1, and if so, malloc a permanent home for the frame struct and update the frame object's struct pointer and flag. In this way, frame allocation overhead could be reduced to the cost of an alloca, or just incorporated into the stack frame setup of the C routine itself, allowing the entire struct to be treated as "local variables" from a C perspective (which might benefit performance on architectures that reserve a register for local variable access). Of course, this would slow down exception handling and other scenarios that result in extra references to a frame object, but if the OS malloc is the slow part of frame allocation (frame objects are too large for pymalloc), then perhaps it would be a net win. On the other hand, this approach would definitely use more stack space per calling level. From pje at telecommunity.com Sun Feb 20 21:56:26 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 21:53:45 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mmztzvyyq.fsf@starship.python.net> References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >"Phillip J. Eby" writes: > > > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: > >>Where are the attempts to speed up function/method calls? That's an > >>area where we could *really* use a breakthrough... > > > > Amen! > > > > So what happened to Armin's pre-allocated frame patch? Did that get > into 2.4? > >No, because it slows down recursive function calls, or functions that >happen to be called at the same time in different threads. Fixing >*that* would require things like code specific frame free-lists and >that's getting a bit convoluted and might waste quite a lot of memory. Ah. I thought it was just going to fall back to the normal case if the pre-allocated frame wasn't available (i.e., didn't have a refcount of 1). >Eliminating the blockstack would be nice (esp. if it's enough to get >frames small enough that they get allocated by PyMalloc) but this >seemed to be tricky too (or at least Armin, Samuele and I spent a >cuple of hours yakking about it on IRC and didn't come up with a clear >approach). Dynamically allocating the blockstack would be simpler, >and might acheive a similar win. (This is all from memory, I haven't >thought about specifics in a while). I'm not very familiar with the operation of the block stack, but why does it need to be a stack? For exception handling purposes, wouldn't it suffice to know the offset of the current handler, and have an opcode to set the current handler location? And for "for" loops, couldn't an anonymous local be used to hold the loop iterator instead of using a stack variable? Hm, actually I think I see the answer; in the case of module-level code there can be no "anonymous local variables" the way there can in functions. Hmm. I guess you'd need to also have a "reset stack to level X" opcode, then, and both it and the set-handler opcode would have to be placed at every destination of a jump that crosses block boundaries. It's not clear how big a win that is, due to the added opcodes even on non-error paths. Hey, wait a minute... all the block stack data is static, isn't it? I mean, the contents of the block stack at any point in a code string could be determined statically, by examination of the bytecode, couldn't it? If that's the case, then perhaps we could design a pre-computed data structure similar to co_lnotab that would be used by the evaluator in place of the blockstack. Of course, I may be talking through my hat here, as I have very little experience with how the blockstack works. However, if this idea makes sense, then perhaps it could actually speed up non-error paths as well (except perhaps for the 'return' statement), at the cost of a larger code structure and compiler complexity. But, if it also means that frames can be allocated faster (e.g. via pymalloc), it might be worth it, just like getting rid of SET_LINENO turned out to be a net win. >All of it, in easy cases. ISTR that the fast path could be a little >wider -- it bails when the called function has default arguments, but >I think this case could be handled easily enough. When it has *any* default arguments, or only when it doesn't have values to supply for them? >Why are frames so big? Because there are CO_MAXBLOCKS * 12 bytes in there for the block stack. If there was no need for that, frames could perhaps be allocated via pymalloc. They only have around 100 bytes or so in them, apart from the blockstack and locals/value stack. > > Do we need a tp_callmethod that takes an argument array, length, and > > keywords, so that we can skip instancemethod allocation in the > > common case of calling a method directly? > >Hmm, didn't think of that, and I don't think it's how the CALL_ATTR >attempt worked. I presume it would need to take a method name too :) Er, yeah, I thought that was obvious. :) From pje at telecommunity.com Sun Feb 20 22:34:50 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 22:32:10 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote: >At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >>Eliminating the blockstack would be nice (esp. if it's enough to get >>frames small enough that they get allocated by PyMalloc) but this >>seemed to be tricky too (or at least Armin, Samuele and I spent a >>cuple of hours yakking about it on IRC and didn't come up with a clear >>approach). Dynamically allocating the blockstack would be simpler, >>and might acheive a similar win. (This is all from memory, I haven't >>thought about specifics in a while). I think I have an idea how to do it in a (relatively) simple fashion; see if you can find a hole in it: * Change the PyTryBlock struct to include an additional member, 'int b_prev', that refers to the previous block in a chain * Change the compiler's emission of SETUP_* opcodes, so that instead of a PyTryBlock being added to the blockstack at interpretation time, it's added to the end of a 'co_blktree' block array at compile time, with its 'b_prev' pointing to the current "top" of the block stack. Instead of the SETUP_* argument being the handler offset, have it be the index of the just-added blocktree entry. * Replace f_blockstack and f_iblock with 'int f_iblktree', and change PyFrame_BlockSetup() to set this equal to the SETUP_* argument, and PyFrame_BlockPop() to use this as an index into the code's co_blktree to retrieve the needed values. PyFrame_BlockPop() would then set f_iblktree equal to the "popped" block's 'b_prev' member, thus "popping" the block from this virtual stack. (Note, by the way, that the blocktree could actually be created as a post-processing step of the current compilation process, by a loop that scans the bytecode and tracks the current stack and blockstack levels, and then replaces the SETUP_* opcodes' arguments. This might be a simpler option than trying to change the compiler to do it along the way.) Can anybody see any flaws in this concept? As far as I can tell it just generates all possible block stack states at compile time, but doesn't change block semantics in the least, and it scarcely touches the eval loop. It seems like it could drop the size of frames enough to let them use pymalloc instead of the OS malloc, at the cost of a 16 bytes per block increase in the size of code objects. (And of course the necessary changes to 'marshal' and 'dis' as well as the compiler and eval loop.) (More precisely, frames whose f_nlocals + f_stacksize is 40 or less, would be 256 bytes or less, and therefore pymalloc-able. However, this should cover all but the most complex functions.) From mwh at python.net Sun Feb 20 22:54:43 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 22:54:46 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 20 Feb 2005 15:56:26 -0500") References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <2m8y5ix5gc.fsf@starship.python.net> "Phillip J. Eby" writes: > At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >>"Phillip J. Eby" writes: >> >> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >> >>Where are the attempts to speed up function/method calls? That's an >> >>area where we could *really* use a breakthrough... >> > >> > Amen! >> > >> > So what happened to Armin's pre-allocated frame patch? Did that >> get into 2.4? >> >>No, because it slows down recursive function calls, or functions that >>happen to be called at the same time in different threads. Fixing >>*that* would require things like code specific frame free-lists and >>that's getting a bit convoluted and might waste quite a lot of memory. > > Ah. I thought it was just going to fall back to the normal case if > the pre-allocated frame wasn't available (i.e., didn't have a refcount > of 1). Well, I don't think that's the test, but that might work. Someone should try it :) (I'm trying something else currently). >>Eliminating the blockstack would be nice (esp. if it's enough to get >>frames small enough that they get allocated by PyMalloc) but this >>seemed to be tricky too (or at least Armin, Samuele and I spent a >>cuple of hours yakking about it on IRC and didn't come up with a clear >>approach). Dynamically allocating the blockstack would be simpler, >>and might acheive a similar win. (This is all from memory, I haven't >>thought about specifics in a while). > > I'm not very familiar with the operation of the block stack, but why > does it need to be a stack? Finally blocks are the problem, I think. > For exception handling purposes, wouldn't it suffice to know the > offset of the current handler, and have an opcode to set the current > handler location? And for "for" loops, couldn't an anonymous local > be used to hold the loop iterator instead of using a stack variable? > Hm, actually I think I see the answer; in the case of module-level > code there can be no "anonymous local variables" the way there can in > functions. Hmm. I don't think this is the killer blow. I can't remember the details and it's too late to think about them, so I'm going to wait and see if Samuele replies :) >>All of it, in easy cases. ISTR that the fast path could be a little >>wider -- it bails when the called function has default arguments, but >>I think this case could be handled easily enough. > > When it has *any* default arguments, or only when it doesn't have > values to supply for them? When it has *any*, I think. I also think this is easy to change. >>Why are frames so big? > > Because there are CO_MAXBLOCKS * 12 bytes in there for the block > stack. If there was no need for that, frames could perhaps be > allocated via pymalloc. They only have around 100 bytes or so in > them, apart from the blockstack and locals/value stack. What I'm trying is allocating the blockstack separately and see if two pymallocs are cheaper than one malloc. >> > Do we need a tp_callmethod that takes an argument array, length, and >> > keywords, so that we can skip instancemethod allocation in the >> > common case of calling a method directly? >> >>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR >>attempt worked. I presume it would need to take a method name too :) > > Er, yeah, I thought that was obvious. :) Someone should try this too :) Cheers, mwh -- It is never worth a first class man's time to express a majority opinion. By definition, there are plenty of others to do that. -- G. H. Hardy From greg.ewing at canterbury.ac.nz Mon Feb 21 03:14:13 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 03:14:29 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> Message-ID: <421943F5.7080408@canterbury.ac.nz> Phillip J. Eby wrote: > At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote: > >> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >> >>> Eliminating the blockstack would be nice (esp. if it's enough to get >>> frames small enough that they get allocated by PyMalloc) Someone might like to take a look at the way Pyrex generates C code for try-except and try-finally blocks. It manages to get (what I hope is) the same effect using local variables and gotos. It doesn't have to deal with a stack pointer, but I think that should just be a compiler-determinable adjustment to be done when jumping to an outer block. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Mon Feb 21 04:32:11 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 04:32:27 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <4219563B.8080503@canterbury.ac.nz> Phillip J. Eby wrote: > Hm, actually I think I see the answer; in the case of module-level code > there can be no "anonymous local variables" the way there can in > functions. Why not? There's still a frame object associated with the call of the anonymous function holding the module's top-level code. The compiler can allocate locals in that frame, even if the user's code can't. > I guess you'd need to also have a "reset stack to > level X" opcode, then, and both it and the set-handler opcode would have > to be placed at every destination of a jump that crosses block > boundaries. It's not clear how big a win that is, due to the added > opcodes even on non-error paths. Only exceptions and break statements would require stack pointer adjustment, and they're relatively rare. I don't think an extra opcode in those cases would make much of a difference. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Mon Feb 21 04:32:25 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 04:32:43 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: <42195649.3030400@canterbury.ac.nz> Alex Martelli wrote: > > On 2005 Feb 20, at 17:06, Guido van Rossum wrote: > >> Oh, bah. That's not what basestring was for. I can't blame you or your >> client, but my *intention* was that basestring would *only* be the >> base of the two *real* built-in string types (str and unicode). I think all this just reinforces the notion that LBYL is a bad idea! > The need to check "is this thingy here string-like" is sort of frequent, > because strings are sequences which, when iterated on, yield sequences > (strings of length 1) which, when iterated on, yield sequences ad > infinitum. Yes, this characteristic of strings is unfortunate because it tends to make some degree of LBYLing unavoidable. I don't think the right solution is to try to come up with safe ways of doing LBYL on strings, though, at least not in the long term. Maybe in Python 3000 this could be fixed by making strings *not* be sequences. They would be sliceable, but *not* indexable or iterable. If you wanted to iterate over their chars, you would have to say 'for c in s.chars()' or something. Then you would be able to test whether something is sequence-like by the presence of __getitem__ or __iter__ methods, without getting tripped up by strings. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From pje at telecommunity.com Mon Feb 21 04:41:09 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Feb 21 04:38:29 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <4219563B.8080503@canterbury.ac.nz> References: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220223833.02e8dc80@mail.telecommunity.com> At 04:32 PM 2/21/05 +1300, Greg Ewing wrote: >Phillip J. Eby wrote: > >>Hm, actually I think I see the answer; in the case of module-level code >>there can be no "anonymous local variables" the way there can in functions. > >Why not? There's still a frame object associated with the call >of the anonymous function holding the module's top-level code. >The compiler can allocate locals in that frame, even if the >user's code can't. That's a good point, but if you look at my "eliminating the block stack" post, you'll see that there's a simpler way to potentially get rid of the block stack, where "simpler" means "simpler changes in fewer places". From pje at telecommunity.com Mon Feb 21 04:44:44 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Feb 21 04:42:05 2005 Subject: [Python-Dev] UserString In-Reply-To: <42195649.3030400@canterbury.ac.nz> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: <5.1.1.6.0.20050220224135.02e90ad0@mail.telecommunity.com> At 04:32 PM 2/21/05 +1300, Greg Ewing wrote: >Alex Martelli wrote: >>The need to check "is this thingy here string-like" is sort of frequent, >>because strings are sequences which, when iterated on, yield sequences >>(strings of length 1) which, when iterated on, yield sequences ad infinitum. > >Yes, this characteristic of strings is unfortunate because it >tends to make some degree of LBYLing unavoidable. FWIW, the trick I usually use to deal with this aspect of strings in recursive algorithms is to check whether the current item of an iteration is the same object I'm iterating over; if so, I know I've descended into a string. It doesn't catch it on the first recursion level of course (unless it was a 1-character string to start with), but it's a quick-and-dirty way to EAFP such algorithms. From gvanrossum at gmail.com Mon Feb 21 04:42:34 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 04:42:37 2005 Subject: [Python-Dev] UserString In-Reply-To: <42195649.3030400@canterbury.ac.nz> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: > >> Oh, bah. That's not what basestring was for. I can't blame you or your > >> client, but my *intention* was that basestring would *only* be the > >> base of the two *real* built-in string types (str and unicode). > > I think all this just reinforces the notion that LBYL is > a bad idea! In this case, perhaps; but in general? (And I think there's a legitimate desire to sometimes special-case string-like things, e.g. consider a function that takes either a stream or a filename argument.) Anyway, can you explain why LBYL is bad? > > The need to check "is this thingy here string-like" is sort of frequent, > > because strings are sequences which, when iterated on, yield sequences > > (strings of length 1) which, when iterated on, yield sequences ad > > infinitum. > > Yes, this characteristic of strings is unfortunate because it > tends to make some degree of LBYLing unavoidable. I don't > think the right solution is to try to come up with safe ways > of doing LBYL on strings, though, at least not in the long > term. > > Maybe in Python 3000 this could be fixed by making strings *not* > be sequences. They would be sliceable, but *not* indexable or > iterable. If you wanted to iterate over their chars, you > would have to say 'for c in s.chars()' or something. > > Then you would be able to test whether something is sequence-like > by the presence of __getitem__ or __iter__ methods, without > getting tripped up by strings. There would be other ways to get out of this dilemma; we could introduce a char type, for example. Also, strings might be recognizable by other means, e.g. the presence of a lower() method or some other characteristic method that doesn't apply to sequence in general. (To Alex: leaving transform() out of the string interface seems to me the simplest solution.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gvanrossum at gmail.com Mon Feb 21 04:47:08 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 04:47:11 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> Message-ID: > >>Eliminating the blockstack would be nice (esp. if it's enough to get > >>frames small enough that they get allocated by PyMalloc) but this > >>seemed to be tricky too (or at least Armin, Samuele and I spent a > >>cuple of hours yakking about it on IRC and didn't come up with a clear > >>approach). Dynamically allocating the blockstack would be simpler, > >>and might acheive a similar win. (This is all from memory, I haven't > >>thought about specifics in a while). I don't know if this helps, but since I invented the block stack around 1990, I believe I recall the main reason to make it dynamic was to simplify code generation, not because it is inherently dynamic. At the time an extra run-time data structure seemed to require less coding than an extra compile-time data structure. The same argument got me using dicts for locals; that was clearly a bottleneck and eliminated long ago, but I think we should be able to lose the block stack now, too. Somewhat ironically, eliminating the block stack will reduce the stack frame size, while eliminating the dict for locals added to it. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Mon Feb 21 08:06:37 2005 From: aleax at aleax.it (Alex Martelli) Date: Mon Feb 21 08:06:43 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> On 2005 Feb 21, at 04:42, Guido van Rossum wrote: >>>> Oh, bah. That's not what basestring was for. I can't blame you or >>>> your >>>> client, but my *intention* was that basestring would *only* be the >>>> base of the two *real* built-in string types (str and unicode). >> >> I think all this just reinforces the notion that LBYL is >> a bad idea! > > In this case, perhaps; but in general? (And I think there's a > legitimate desire to sometimes special-case string-like things, e.g. > consider a function that takes either a stream or a filename > argument.) > > Anyway, can you explain why LBYL is bad? In the general case, it's bad because of a combination of issues. It may violate "once, and only once!" -- the operations one needs to check may basicaly duplicate the operations one then wants to perform. Apart from wasted effort, it may happen that the situation changes between the look and the leap (on an external file, or due perhaps to threading or other reentrancy). It's often hard in the look to cover exactly the set of prereq's you need for the leap -- e.g. I've often seen code such as if i < len(foo): foo[i] = 24 which breaks for i<-len(foo); the first time this happens the guard's changed to 0<=i > Then you would be able to test whether something is sequence-like >> by the presence of __getitem__ or __iter__ methods, without >> getting tripped up by strings. > > There would be other ways to get out of this dilemma; we could > introduce a char type, for example. Also, strings might be > recognizable by other means, e.g. the presence of a lower() method or > some other characteristic method that doesn't apply to sequence in > general. Sure, there would many possibilities. > (To Alex: leaving transform() out of the string interface seems to me > the simplest solution.) I guess you mean translate. Yes, that would probably be simplest. Alex From aleax at aleax.it Mon Feb 21 08:06:37 2005 From: aleax at aleax.it (Alex Martelli) Date: Mon Feb 21 08:06:45 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> On 2005 Feb 21, at 04:42, Guido van Rossum wrote: >>>> Oh, bah. That's not what basestring was for. I can't blame you or >>>> your >>>> client, but my *intention* was that basestring would *only* be the >>>> base of the two *real* built-in string types (str and unicode). >> >> I think all this just reinforces the notion that LBYL is >> a bad idea! > > In this case, perhaps; but in general? (And I think there's a > legitimate desire to sometimes special-case string-like things, e.g. > consider a function that takes either a stream or a filename > argument.) > > Anyway, can you explain why LBYL is bad? In the general case, it's bad because of a combination of issues. It may violate "once, and only once!" -- the operations one needs to check may basicaly duplicate the operations one then wants to perform. Apart from wasted effort, it may happen that the situation changes between the look and the leap (on an external file, or due perhaps to threading or other reentrancy). It's often hard in the look to cover exactly the set of prereq's you need for the leap -- e.g. I've often seen code such as if i < len(foo): foo[i] = 24 which breaks for i<-len(foo); the first time this happens the guard's changed to 0<=i > Then you would be able to test whether something is sequence-like >> by the presence of __getitem__ or __iter__ methods, without >> getting tripped up by strings. > > There would be other ways to get out of this dilemma; we could > introduce a char type, for example. Also, strings might be > recognizable by other means, e.g. the presence of a lower() method or > some other characteristic method that doesn't apply to sequence in > general. Sure, there would many possibilities. > (To Alex: leaving transform() out of the string interface seems to me > the simplest solution.) I guess you mean translate. Yes, that would probably be simplest. Alex From mwh at python.net Mon Feb 21 10:00:11 2005 From: mwh at python.net (Michael Hudson) Date: Mon Feb 21 10:00:13 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2m8y5ix5gc.fsf@starship.python.net> (Michael Hudson's message of "Sun, 20 Feb 2005 21:54:43 +0000") References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <2m8y5ix5gc.fsf@starship.python.net> Message-ID: <2m1xbawan8.fsf@starship.python.net> Michael Hudson writes: >> Because there are CO_MAXBLOCKS * 12 bytes in there for the block >> stack. If there was no need for that, frames could perhaps be >> allocated via pymalloc. They only have around 100 bytes or so in >> them, apart from the blockstack and locals/value stack. > > What I'm trying is allocating the blockstack separately and see if two > pymallocs are cheaper than one malloc. This makes no difference at all, of course -- once timeit or pystone gets going the code path that actually allocates a new frame as opposed to popping one off the free list simply never gets executed. Duh! Cheers, mwh (and despite what the sigmonster implies, I wasn't drunk last night :) -- This is an off-the-top-of-the-head-and-not-quite-sober suggestion, so is probably technically laughable. I'll see how embarassed I feel tomorrow morning. -- Patrick Gosling, ucam.comp.misc From z_axis at 163.com Mon Feb 21 14:54:33 2005 From: z_axis at 163.com (z-axis) Date: Mon Feb 21 14:49:38 2005 Subject: [Python-Dev] Re: Welcome to the "Python-Dev" mailing list Message-ID: <20050221134936.909271E4003@bag.python.org> hi,friends i am a python newbie but i used Java for about 5 years. when i saw python introduce in a famous magzine called < > in China, i am immediately absorbed by its pretty code. i hope i can use Python to do real development. regards! ¡¡¡¡ ======== 2005-02-21 14:28:00 ÄúÔÚÀ´ÐÅÖÐдµÀ£º ======== Welcome to the Python-Dev@python.org mailing list! If you are a new subscriber, please take the time to introduce yourself briefly in your first post. It is appreciated if you lurk around for a while before posting! :-) Additional information on Python's development process can be found in the Python Developer's Guide: http://www.python.org/dev/ To post to this list, send your email to: python-dev@python.org General information about the mailing list is at: http://mail.python.org/mailman/listinfo/python-dev If you ever want to unsubscribe or change your options (eg, switch to or from digest mode, change your password, etc.), visit your subscription page at: http://mail.python.org/mailman/options/python-dev/z_axis%40163.com You can also make such adjustments via email by sending a message to: Python-Dev-request@python.org with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You must know your password to change your options (including changing the password, itself) or to unsubscribe. It is: zpython999 Normally, Mailman will remind you of your python.org mailing list passwords once every month, although you can disable this if you prefer. This reminder will also include instructions on how to unsubscribe or change your account options. There is also a button on your options page that will email your current password to you. = = = = = = = = = = = = = = = = = = = = = = ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Ö Àñ£¡ ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡z-axis z_axis@163.com ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ ¡¡¡¡¡¡¡¡¡¡2005-02-21 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050221/e537f44e/attachment.htm From gvanrossum at gmail.com Mon Feb 21 17:15:47 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 17:15:51 2005 Subject: [Python-Dev] UserString In-Reply-To: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> Message-ID: > > Anyway, can you explain why LBYL is bad? > > In the general case, it's bad because of a combination of issues. It > may violate "once, and only once!" -- the operations one needs to check > may basicaly duplicate the operations one then wants to perform. Apart > from wasted effort, it may happen that the situation changes between > the look and the leap (on an external file, or due perhaps to threading > or other reentrancy). It's often hard in the look to cover exactly the > set of prereq's you need for the leap -- e.g. I've often seen code such > as > if i < len(foo): > foo[i] = 24 > which breaks for i<-len(foo); the first time this happens the guard's > changed to 0<=i w/negative index; finally it stabilizes to the correct check, > -len(foo)<=i check that Python performs again when you then use foo[i]... just > cluttering code. The intermediate Pythonista's who's learned to code > "try: foo[i]=24 // except IndexError: pass" is much better off than the > one who's still striving to LBYL as he had (e.g.) when using C. > > Etc -- this is all very general and generic. Right. There are plenty of examples where LBYL is better, e.g. because there are too many different exceptions to catch, or they occur in too many places. One of my favorites is creating a directory if it doesn't already exist; I always use this LBYL-ish pattern: if not os.path.exists(dn): try: os.makedirs(dn) except os.error, err: ...log the error... because the specific exception for "it already exists" is quite subtle to pull out of the os.error structure. Taken to th extreme, the "LBYL is bad" meme would be an argument against my optional type checking proposal, which I doubt is what you want. So, I'd like to take a much more balanced view on LBYL. > I had convinced myself that strings were a special case worth singling > out, via isinstance and basestring, just as (say) dictionaries are > singled out quite differently by metods such as get... I may well have > been too superficial in this conclusion. I think there are lots of situations where the desire to special-case strings is legitimate. > >> Then you would be able to test whether something is sequence-like > >> by the presence of __getitem__ or __iter__ methods, without > >> getting tripped up by strings. > > > > There would be other ways to get out of this dilemma; we could > > introduce a char type, for example. Also, strings might be > > recognizable by other means, e.g. the presence of a lower() method or > > some other characteristic method that doesn't apply to sequence in > > general. > > Sure, there would many possibilities. > > > (To Alex: leaving transform() out of the string interface seems to me > > the simplest solution.) > > I guess you mean translate. Yes, that would probably be simplest. Right. BTW, there's *still* no sign from a PEP 246 rewrite. Maybe someone could offer Clark a hand? (Last time I inquired he was recovering from a week of illness.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Mon Feb 21 22:24:32 2005 From: python at rcn.com (Raymond Hettinger) Date: Mon Feb 21 22:28:35 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: Message-ID: <000a01c5185b$bc999700$f61ac797@oemcomputer> > Where are the attempts to speed up function/method calls? That's an > area where we could *really* use a breakthrough... At one time you had entertained treating some of the builtin calls as fixed. Is that something you want to go forward with? It would entail a "from __future__" and transition period. It would not be hard to take code like "return len(alist)" and transform it from: 2 0 LOAD_GLOBAL 0 (len) 3 LOAD_FAST 0 (alist) 6 CALL_FUNCTION 1 9 RETURN_VALUE to: 2 0 LOAD_FAST 0 (alist) 3 OBJECT_LEN 4 RETURN_VALUE Some functions already have a custom opcode that cannot be used unless we freeze the meaning of the function name: repr --> UNARY_CONVERT --> PyObject_Repr iter --> GET_ITER --> PyObject_GetIter Alternately, functions could be served by a table of known, fixed functions: 2 0 LOAD_FAST 0 (alist) 3 CALL_DEDICATED 0 (PyObject_Len) 6 RETURN_VALUE where the dispatch table is something like: [PyObject_Len, PyObject_Repr, PyObject_IsInstance, PyObject_IsTrue, PyObject_GetIter, ...]. Of course, none of these offer a big boost and there is some loss of dynamic behavior. Raymond From barry at python.org Tue Feb 22 03:50:01 2005 From: barry at python.org (Barry Warsaw) Date: Tue Feb 22 03:50:17 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> Message-ID: <1109040601.25187.170.camel@presto.wooz.org> On Mon, 2005-02-21 at 11:15, Guido van Rossum wrote: > Right. There are plenty of examples where LBYL is better, e.g. because > there are too many different exceptions to catch, or they occur in too > many places. One of my favorites is creating a directory if it doesn't > already exist; I always use this LBYL-ish pattern: > > if not os.path.exists(dn): > try: > os.makedirs(dn) > except os.error, err: > ...log the error... > > because the specific exception for "it already exists" is quite subtle > to pull out of the os.error structure. Really? I do this kind of thing all the time: import os import errno try: os.makedirs(dn) except OSError, e: if e.errno <> errno.EEXIST: raise -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20050221/ae2d9387/attachment.pgp From quarl at NOSPAM.quarl.org Tue Feb 22 02:41:38 2005 From: quarl at NOSPAM.quarl.org (Karl Chen) Date: Tue Feb 22 07:34:34 2005 Subject: [Python-Dev] textwrap wordsep_re Message-ID: Hi, textwrap.fill() is awesome. Except when the string to wrap contains dates -- which I would like not to be broken. In general I think wordsep_re can be smarter about what it decides are hyphenated words. For example, this code: print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) produces: aaaaaaaaaa 2005- 02-21 A slightly tweaked wordsep_re: textwrap.TextWrapper.wordsep_re = \ re.compile(r'(\s+|' # any whitespace r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) behaves better: aaaaaaaaaa 2005-02-21 What do you think about changing the default wordsep_re? -- Karl 2005-02-21 17:39 From aahz at pythoncraft.com Tue Feb 22 15:35:06 2005 From: aahz at pythoncraft.com (Aahz) Date: Tue Feb 22 15:35:10 2005 Subject: [Python-Dev] textwrap wordsep_re In-Reply-To: References: Message-ID: <20050222143506.GA27893@panix.com> On Mon, Feb 21, 2005, Karl Chen wrote: > > A slightly tweaked wordsep_re: > textwrap.TextWrapper.wordsep_re = \ > re.compile(r'(\s+|' # any whitespace > r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words > r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash > print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) > behaves better: > aaaaaaaaaa > 2005-02-21 > > What do you think about changing the default wordsep_re? Please post a patch to SF. If you're not familiar with the process, take a look at http://www.python.org/dev/dev_intro.html Another thing: I don't know whether you'll get this in direct e-mail; it's considered a bit rude for python-dev to use munged addresses. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From gvanrossum at gmail.com Tue Feb 22 17:16:52 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 22 17:16:57 2005 Subject: [Python-Dev] UserString In-Reply-To: <1109040601.25187.170.camel@presto.wooz.org> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: > Really? I do this kind of thing all the time: > > import os > import errno > try: > os.makedirs(dn) > except OSError, e: > if e.errno <> errno.EEXIST: > raise You have a lot more faith in the errno module than I do. Are you sure the same error codes work on all platforms where Python works? It's also not exactly readable (except for old Unix hacks). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From david.ascher at gmail.com Tue Feb 22 17:20:47 2005 From: david.ascher at gmail.com (David Ascher) Date: Tue Feb 22 17:20:50 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: On Tue, 22 Feb 2005 08:16:52 -0800, Guido van Rossum wrote: > > Really? I do this kind of thing all the time: > > > > import os > > import errno > > try: > > os.makedirs(dn) > > except OSError, e: > > if e.errno <> errno.EEXIST: > > raise > > You have a lot more faith in the errno module than I do. Are you sure > the same error codes work on all platforms where Python works? It's > also not exactly readable (except for old Unix hacks). Agreed. In general, I often wish in production code (especially in not-100% Python systems) that Python did a better job of at the very least documenting what kinds of exceptions were raised by what function calls. Otherwise you end up with what are effectively blanket try/except statements way too often for my taste. --da From andymac at bullseye.apana.org.au Tue Feb 22 13:13:08 2005 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Tue Feb 22 19:19:49 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: <421B21D4.5050306@bullseye.apana.org.au> Fredrik Lundh wrote: > it could be worth expanding them to > > "if x == 1 or x == 2 or x == 3:" > > though... > > C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" > 10000000 loops, best of 3: 0.11 usec per loop > C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.0691 usec per loop > > C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.123 usec per loop > C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" > 10000000 loops, best of 3: 0.143 usec per loop > > C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.187 usec per loop > C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" > 1000000 loops, best of 3: 0.197 usec per loop > > C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" > 1000000 loops, best of 3: 0.225 usec per loop > C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.161 usec per loop Out of curiousity I ran /F's tests on my FreeBSD 4.8 box with a recent checkout: $ ./python Lib/timeit.py -s "a = 1" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.247 usec per loop $ ./python Lib/timeit.py -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.225 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.343 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.353 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.415 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.457 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.467 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.488 usec per loop I then applied this patch: --- Objects/tupleobject.c.orig Fri Jun 11 05:28:08 2004 +++ Objects/tupleobject.c Tue Feb 22 22:10:18 2005 @@ -298,6 +298,11 @@ int i, cmp; for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i) + cmp = (PyTuple_GET_ITEM(a, i) == el); + if (cmp) + return cmp; + + for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i) cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i), Py_EQ); return cmp; Re-running the tests yielded: $ ./python Lib/timeit.py -s "a = 1" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.234 usec per loop $ ./python Lib/timeit.py -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.228 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.239 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.36 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.241 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.469 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.475 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.489 usec per loop ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From quarl at cs.berkeley.edu Mon Feb 21 12:39:41 2005 From: quarl at cs.berkeley.edu (Karl Chen) Date: Tue Feb 22 20:00:13 2005 Subject: [Python-Dev] textwrap.py wordsep_re Message-ID: Hi, textwrap.fill() is awesome. Except when the string to wrap contains dates -- which I would like not to be filled. In general I think wordsep_re can be smarter about what it decides are hyphenated words. For example, this code: print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) produces: aaaaaaaaaa 2005- 02-21 A slightly tweaked wordsep_re: textwrap.TextWrapper.wordsep_re =\ re.compile(r'(\s+|' # any whitespace r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) behaves better: aaaaaaaaaa 2005-02-21 What do you think about changing the default wordsep_re? -- Karl 2005-02-21 03:32 From michel at dialnetwork.com Wed Feb 23 03:04:34 2005 From: michel at dialnetwork.com (Michel Pelletier) Date: Wed Feb 23 00:24:07 2005 Subject: [Python-Dev] UserString In-Reply-To: <20050222110123.608C41E403C@bag.python.org> References: <20050222110123.608C41E403C@bag.python.org> Message-ID: <200502221804.34808.michel@dialnetwork.com> On Tuesday 22 February 2005 03:01 am, Guido wrote: > > BTW, there's *still* no sign from a PEP 246 rewrite. Maybe someone > could offer Clark a hand? (Last time I inquired he was recovering from > a week of illness.) Last summer Alex, Clark, Phillip and I swapped a few emails about reviving the 245/246 drive and submitting a plan for a PSF grant. I was pushing the effort and then had to lamely drop out due to a new job. This is good grant material for someone which leads to my question, when will the next cycle of PSF grants happen? I'm not volunteering and I won't have the bandwidth to participate, but if there are other starving souls out there willing to do the heavy lifting to help Alex it could get done quickly within the PSFs own framework for advancing the language. -Michel From andrewm at object-craft.com.au Wed Feb 23 01:14:45 2005 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Wed Feb 23 01:14:34 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: <20050223001445.DB6583C889@coffee.object-craft.com.au> >> if e.errno <> errno.EEXIST: >> raise > >You have a lot more faith in the errno module than I do. Are you sure >the same error codes work on all platforms where Python works? It's >also not exactly readable (except for old Unix hacks). On the other hand, LBYL in this context can result in race conditions and security vulnerabilities. "os.makedirs" is already a composite of many system calls, so all bets are off anyway, but for simpler operations that result in an atomic system call, this is important. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From tim.peters at gmail.com Wed Feb 23 03:57:22 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 23 03:57:25 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.344, 2.345 In-Reply-To: References: Message-ID: <1f7befae050222185758fdd46e@mail.gmail.com> [rhettinger@users.sourceforge.net] > Modified Files: > compile.c > Log Message: > Teach the peepholer to fold unary operations on constants. > > Afterwards, -0.5 loads in a single step and no longer requires a runtime > UNARY_NEGATIVE operation. Aargh. The compiler already folded in a leading minus for ints, and exempting floats from this was deliberate. Stick this in a file: import math print math.atan2(-0.0, -0.0) If you run that directly, a decent 754-conforming libm will display an approximation to -pi (-3.14...; this is the required result in C99 if its optional 754 support is implemented, and even MSVC has done this all along). But if you import the same module from a .pyc or .pyo, now on the HEAD it prints 0.0 instead. In 2.4 it still prints -pi. I often say that all behavior in the presence of infinities, NaNs, and signed zeroes is undefined in CPython, and that's strictly true (just _try_ to find reassuring words about any of those cases in the Python docs ). But it's still the case that we (meaning mostly me) strive to preserve sensible 754 semantics when it's reasonably possible to do so. Not even gonzo-optimizing Fortran compilers will convert -0.0 to 0.0 anymore, precisely because it's not semantically neutral. In this case, it's marshal that drops the sign bit of a float 0 on the floor, so surprises result if and only if you run from a precompiled Python module now. I don't think you need to revert the whole patch, but -0.0 must be left alone (or marshal taught to preserve the sign of a float 0.0 -- but then you have the problem of _detecting_ the sign of a float 0.0, and nothing in standard C89 can do so). Even in 754-land, it's OK to fold in the sign for non-zero float literals (-x is always unexceptional in 754 unless x is a signaling NaN, and there are no signaling NaN literals; and the sign bit of any finite float except zero is already preserved by marshal). From kbk at shore.net Wed Feb 23 05:19:55 2005 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Feb 23 05:20:51 2005 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200502230419.j1N4Jthi005718@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 308 open (+10) / 2755 closed ( +1) / 3063 total (+11) Bugs : 838 open (+15) / 4834 closed ( +5) / 5672 total (+20) RFE : 168 open ( +0) / 148 closed ( +4) / 316 total ( +4) New / Reopened Patches ______________________ do not add directory of sys.argv[0] into sys.path (2004-05-02) http://python.org/sf/946373 reopened by wrobell isapi.samples.advanced.py fix (2005-02-17) http://python.org/sf/1126187 opened by Philippe Kirsanov more __contains__ tests (2005-02-17) http://python.org/sf/1141428 opened by Jim Jewett Fix to allow urllib2 digest auth to talk to livejournal.com (2005-02-18) http://python.org/sf/1143695 opened by Benno Rice Add IEEE Float support to wave.py (2005-02-19) http://python.org/sf/1144504 opened by Ben Schwartz cgitb: make more usable for 'binary-only' sw (new patch) (2005-02-19) http://python.org/sf/1144549 opened by Reinhold Birkenfeld allow UNIX mmap size to default to current file size (new) (2005-02-19) http://python.org/sf/1144555 opened by Reinhold Birkenfeld Make OpenerDirector instances pickle-able (2005-02-20) http://python.org/sf/1144636 opened by John J Lee webbrowser.Netscape.open bug fix (2005-02-20) http://python.org/sf/1144816 opened by Pernici Mario Replace store/load pair with a single new opcode (2005-02-20) http://python.org/sf/1144842 opened by Raymond Hettinger Remove some invariant conditions and assert in ceval (2005-02-20) http://python.org/sf/1145039 opened by Neal Norwitz Patches Closed ______________ date.strptime and time.strptime as well (2005-02-04) http://python.org/sf/1116362 closed by josh-sf New / Reopened Bugs ___________________ attempting to use urllib2 on some URLs fails starting on 2.4 (2005-02-16) http://python.org/sf/1123695 opened by Stephan Sokolow descrintro describes __new__ and __init__ behavior wrong (2005-02-15) http://python.org/sf/1123716 opened by Steven Bethard gensuitemodule.processfile fails (2005-02-16) http://python.org/sf/1123727 opened by Jurjen N.E. Bos PyDateTime_FromDateAndTime documented as PyDate_FromDateAndT (2005-02-16) CLOSED http://python.org/sf/1124278 opened by smilechaser Function's __name__ no longer accessible in restricted mode (2005-02-16) CLOSED http://python.org/sf/1124295 opened by Tres Seaver Python24.dll crashes, EXAMPLE ATTACHED (2005-02-12) CLOSED http://python.org/sf/1121201 reopened by complex IDLE line wrapping (2005-02-16) CLOSED http://python.org/sf/1124503 opened by Chris Rebert test_os fails on 2.4 (2005-02-17) CLOSED http://python.org/sf/1124513 reopened by doerwalter test_os fails on 2.4 (2005-02-16) CLOSED http://python.org/sf/1124513 opened by Brett Cannon test_subprocess is far too slow (2005-02-17) http://python.org/sf/1124637 opened by Michael Hudson Math mode not well handled in \documentclass{howto} (2005-02-17) http://python.org/sf/1124692 opened by Daniele Varrazzo GetStdHandle in interactive GUI (2005-02-17) http://python.org/sf/1124861 opened by davids subprocess.py Errors with IDLE (2005-02-17) http://python.org/sf/1126208 opened by Kurt B. Kaiser subprocesss module retains older license header (2005-02-17) http://python.org/sf/1138653 opened by Tres Seaver Python syntax is not so XML friendly! (2005-02-18) CLOSED http://python.org/sf/1143855 opened by Colbert Philippe inspect.getsource() breakage in 2.4 (2005-02-18) http://python.org/sf/1143895 opened by Armin Rigo future warning in commets (2005-02-18) http://python.org/sf/1144057 opened by Grzegorz Makarewicz reload() is broken for C extension objects (2005-02-19) http://python.org/sf/1144263 opened by Matthew G. Knepley htmllib quote parse error within a
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4