Thanks for the example.--- Todd Miller <jmiller@stsci.edu> wrote:I don't understand what you say, but I believe you.I meant we call PyBuffer_FromReadWriteObject and the resulting buffer
lives longer than the extension function call that created it. I have
heard that it is possible for the original object to "move" leaving the
buffer object pointer to it dangling.
Yes. The PyBufferObject grabs the pointer from the PyBufferProcs
supporting object when the PyBufferObject is created. If the PyBufferProcs
supporting object reallocates the memory (possibly from a resize) the
This is good to know.
PyBufferObject can be left with a bad pointer. This is easily possible if
you try to use the array module arrays as a buffer.
I understand. I saw your patches and they sounded good to me.I've submitted a patch to fix this particular problem (among others), but
there are still enough things that the buffer object can't do that
something new is needed.
I agree with this completely. I could summarize my opinion by saying that whileMaybe instead of the buffer() function/type, there should be a way to
allocate raw memory?Yes. It would also be nice to be able to:1. Know (at the python level) that a type supports the buffer C-API.
Good idea. (I guess right now you can see if calling buffer() with an
instance as argument works. :-)2. Copy bytes from one buffer to another (writeable buffer).
And the copy operations shouldn't create any large temporaries:
Looking at buffering most of this week, the fact that mmap slicing also returns strings is one justification I've found for having a buffer object, i.e., mmap slicing is not a substitute for the buffer object. The buffer object makes it possible to partition a mmap or any bufferable object into pseudo-independent, possibly writable, pieces.buf1 = memory(50000)
buf2 = memory(50000)
# no 10K temporary should be created in the next line
buf1[10000:20000] = buf2[30000:40000]The current buffer object could be used like this, but it would create a
temporary string.
One justification to have a new buffer object is pickling (one of Scott's posts alerted me to this). I think the behavior we want for numarray is to be able to pickle a view of a bufferable object more or less like a string containing the buffer image, and to unpickle it as a memory object. The prospect of adding pickling support makes me wonder if seperating the allocator and view aspects of the buffer object is a good idea; I thought it was, but now I wonder.
Other justifications for a new buffer object might be:
So getting an efficient copy operation seems to require that slices just
create new "views" to the same memory.
1. The ability to partition any bufferable object into regions which can be passed around. These regions
2. The ability to efficiently pickle a view of any bufferable object.
Calling this a memory type sounds the best to me. The question I have not resolved for myselfMaybe you would like to work on a requirements gathering for a memory
objectSure. I'd be willing to poll comp.lang.python (python-list?) and
collate the results of any discussion that ensues. Is that what you had
in mind?In the PEP that I'm drafting, I've been calling the new object "bytes"
(since it is just a simple array of bytes). Now that you guys are
referring to it as the "memory object", should I change the name? Doesn't
really matter, but it might avoid confusion to know we're all talking about
the same thing.
__________________________________________________
Do You Yahoo!?
Yahoo! Autos - Get free new car price quotes
http://autos.yahoo.com
From lalo@laranja.org Thu Jul 18 16:03:40 2002 From: lalo@laranja.org (Lalo Martins) Date: Thu, 18 Jul 2002 12:03:40 -0300 Subject: [Python-Dev] PEP 292-related: why string substitution is not the same operation as data formatting In-Reply-To: <200207121447.g6CElY808029@pcp02138704pcs.reston01.va.comcast.net> References: <20020623181630.GN25927@laranja.org> <200207121447.g6CElY808029@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020718150340.GB1209@laranja.org> On Fri, Jul 12, 2002 at 10:47:34AM -0400, Guido van Rossum wrote: > > Guido, can you please, for our enlightenment, tell us what are the > > reasons you feel %(foo)s was a mistake? > > Because of the trailing 's'. It's very easy to leave it out by > mistake, and because the definition of printf formats skips over > spaces (don't ask me why), the first character of the following word > is used as the type indicator. In case that wasn't clear, I agree with that - I asked because I wanted this in writing for the record. BTW: IIRC, it skips over spaces because spaces are a valid format modifier (meaning "pad with spaces"). []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://www.laranja.org/pessoal/pgp Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/ Python Foundry Guide http://www.sf.net/foundry/python-foundry/ From guido@python.org Thu Jul 18 17:27:25 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 12:27:25 -0400 Subject: [Python-Dev] test_socket failure on FreeBSD In-Reply-To: Your message of "Mon, 08 Jul 2002 23:53:01 +1100." References: Message-ID: <200207181627.g6IGRPE21459@odiug.zope.com> > > There are probably some differences in the socket semantics. I'd > > appreciate it if you could provide a patch or at least a clue! > > I've not read enough Stevens to grok sockets code (yet) :-( > > However, I hope that the instrumented verbose output of test_socket might > give you a clue.... > > I've attached the diff from the version of test_socket (vs recent CVS) > that I used, as well as output from test_socket on FreeBSD 4.4 and > OS/2+EMX. Getting the FreeBSD issues sorted is a higher priority for me > than getting OS/2+EMX working (though that would be nice too). > > Please let me know if there's more testing/debugging I can do. I've got some time for this now. Ignoring your OS/2+EMX output and focusing on the FreeBSD logs, I notice: [...] > Testing recvfrom() in chunks over TCP. ... > seg1='Michael Gilfix was he', addr='None' > seg2='re > ', addr='None' > ERROR Hm. This looks like recvfrom() on a TCP stream doesn't return an address; not entirely unreasonable. I wonder if self.cli_conn.getpeername() returns the expected address; can you check this? Add this after each recvfrom() call. if addr is None: addr = self.cli_conn.getpeername() [...] > Testing large recvfrom() over TCP. ... > msg='Michael Gilfix was here > ', addr='None' > ERROR Ditto. > Testing non-blocking accept. ... > conn= > addr=('127.0.0.1', 3144) > FAIL This is different. It seems that the accept() call doesn't time out. But this could be because the client thread connects too fast. Can you add a sleep (e.g. time.sleep(5)) to _testAccept() before the connect() call? [...] > Testing non-blocking recv. ... > conn= > addr=('127.0.0.1', 3146) > FAIL Similar. Try putting a sleep in _testRecv() between the connect() and the send(). [...] Let me know if you want me to provide specific patches... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 16:49:44 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 11:49:44 -0400 Subject: [Python-Dev] Re: Patch level versions and new features (Was: Some dull gc stats) In-Reply-To: Your message of "Mon, 08 Jul 2002 21:20:56 EDT." <20020709012056.GA2526@cthulhu.gerg.ca> References: <3D220A86.5070003@lemburg.com> <3D22ADD9.1030901@lemburg.com> <15650.64375.162977.160780@anthem.wooz.org> <3D2433B9.9080102@lemburg.com> <15657.39558.325764.651122@anthem.wooz.org> <3D299E42.70200@lemburg.com> <20020709012056.GA2526@cthulhu.gerg.ca> Message-ID: <200207181549.g6IFniw21368@odiug.zope.com> > > Perhaps we could have some kind of category for distutils > > packages which marks them as system add-ons vs. site add-ons. > > +1 -- this should definitely be up to the package author/packager, not > the local admin. I once tried to convince Guido that the ability to > occasionally upgrade standard library modules/packages would be a good > thing, but he wasn't having it. Any change of heart, O Mighty BDFL? Before I answer that, here's a question. Why do we think it's a good idea to distribute upgrades as separate add-ons while we don't think it's okay to distribute such upgrades with bugfix releases? Doesn't this just increase the variability of site configurations, and hence version interaction hell? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 15:22:11 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 10:22:11 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Wed, 17 Jul 2002 15:40:27 PDT." References: Message-ID: <200207181422.g6IEMBr14526@odiug.zope.com> > I do think there is some potential for errors caused by > misunderstandings about whether or not "for x in y" is destructive. > That's the thing that worries me the most. I think this is the main > reason why the old practice of abusing __getitem__ was bad, and thus > helped to motivate iterators in the first place. It seems serious > enough that migrating to something that distinguishes > destructive-for from non-destructive-for could indeed be worth the > cost. I'm not sure I understand this (this seems to be my week for not understanding what people write :-( ). First of all, I'm not sure what exactly the issue is with destructive for-loops. If I have a function that contains a for-loop over its argument, and I pass iter(x) as the argument, then the iterator is destroyed in the process, but x may or may not be, depending on what it is. Maybe the for-loop is a red herring? Calling next() on an iterator may or may not be destructive on the underlying "sequence" -- if it is a generator, for example, I would call it destructive. Perhaps you're trying to assign properties to the iterator abstraction that aren't really there? Next, I'm not sure how renaming next() to __next__() would affect the situation w.r.t. the destructivity of for-loops. Or were you talking about some other migration? --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Thu Jul 18 12:23:16 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 18 Jul 2002 07:23:16 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <200207180043.g6I0hKB25427@pcp02138704pcs.reston01.va.comcast.net> References: <200207180043.g6I0hKB25427@pcp02138704pcs.reston01.va.comcast.net> Message-ID: > > I'm happy to add the text, but i want to be clear, then: is it > > acceptable to write an iterator that only provides if you > > only care about the "iteration protocol" and not the "for-able > > protocol"? > No, an iterator ought to provide both, but it's good to recognize that > there *are* two protocols. > > A class is a valid iterator object when it defines a next() > > method that behaves as described above. A class that wants > > to be an iterator also ought to implement __iter__() > > returning itself. > I would like to see this strengthened. I envision "iterator algebra" > code that really needs to be able to do a for loop over an iterator > when it feels like it. Maybe the reasons behind having __iter__() returning itself should be clearly expressed in the PEP, too. On this list, Tim gave one recently, Guido gives another here, but unless I missed it, the PEP gives none. Usually, PEPs explain the reasons behind the choices. -- François Pinard http://www.iro.umontreal.ca/~pinard From cce@clarkevans.com Thu Jul 18 15:06:31 2002 From: cce@clarkevans.com (Clark C . Evans) Date: Thu, 18 Jul 2002 10:06:31 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: ; from ping@zesty.ca on Wed, Jul 17, 2002 at 02:58:55PM -0500 References: <200207171409.g6HE9Di00659@odiug.zope.com> Message-ID: <20020718100631.A3468@doublegemini.com> On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: | But i think this is more than a minor problem. This is a | namespace collision problem, and that's significant. Naming | the method "next" means that any object with a "next" method | cannot be adapted to support the iterator protocol. Unfortunately | "next" is a pretty common word and it's quite possible that such | a method name is already in use. Ping, Do you have any suggestions for re-wording the Iterator questionare at http://yaml.org/wk/survey?id=pyiter to reflect this paragraph above? Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software From xscottg@yahoo.com Mon Jul 15 18:52:56 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 15 Jul 2002 10:52:56 -0700 (PDT) Subject: [Python-Dev] Fw: Behavior of buffer() In-Reply-To: <3D32FA0D.6020200@stsci.edu> Message-ID: <20020715175256.5971.qmail@web40112.mail.yahoo.com> --- Todd Miller wrote: > > > >I don't understand what you say, but I believe you. > > > I meant we call PyBuffer_FromReadWriteObject and the resulting buffer > lives longer than the extension function call that created it. I have > heard that it is possible for the original object to "move" leaving the > buffer object pointer to it dangling. Yes. The PyBufferObject grabs the pointer from the PyBufferProcs supporting object when the PyBufferObject is created. If the PyBufferProcs supporting object reallocates the memory (possibly from a resize) the PyBufferObject can be left with a bad pointer. This is easily possible if you try to use the array module arrays as a buffer. I've submitted a patch to fix this particular problem (among others), but there are still enough things that the buffer object can't do that something new is needed. > > > > > > >>>Maybe instead of the buffer() function/type, there should be a way to > >>>allocate raw memory? > >>> > > > >>Yes. It would also be nice to be able to: > >> > >>1. Know (at the python level) that a type supports the buffer C-API. > >> > > > >Good idea. (I guess right now you can see if calling buffer() with an > >instance as argument works. :-) > > > >>2. Copy bytes from one buffer to another (writeable buffer). > >> And the copy operations shouldn't create any large temporaries: buf1 = memory(50000) buf2 = memory(50000) # no 10K temporary should be created in the next line buf1[10000:20000] = buf2[30000:40000] The current buffer object could be used like this, but it would create a temporary string. So getting an efficient copy operation seems to require that slices just create new "views" to the same memory. > > > >Maybe you would like to work on a requirements gathering for a memory > >object > > > Sure. I'd be willing to poll comp.lang.python (python-list?) and > collate the results of any discussion that ensues. Is that what you had > in mind? > In the PEP that I'm drafting, I've been calling the new object "bytes" (since it is just a simple array of bytes). Now that you guys are referring to it as the "memory object", should I change the name? Doesn't really matter, but it might avoid confusion to know we're all talking about the same thing. __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com From aleax@aleax.it Thu Jul 18 07:02:23 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 08:02:23 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207172348.g6HNmEB23863@oma.cosc.canterbury.ac.nz> References: <200207172348.g6HNmEB23863@oma.cosc.canterbury.ac.nz> Message-ID: On Thursday 18 July 2002 01:48 am, Greg Ewing wrote: > Alex Martelli : > > Still, it doesn't solve the reference-loop-between-two-deuced-things- > > that-don't-cooperate-with-gc problem. > > Would making them cooperate with GC be a difficult > thing to do? Seems to me we should be moving towards > making everything cooperate with GC, and fixing > things like this whenever they come to light. Tim Peters says it wouldn't be, but I have not explored that. Alex From aleax@aleax.it Thu Jul 18 06:52:34 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 07:52:34 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> References: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> Message-ID: On Thursday 18 July 2002 01:25 am, Greg Ewing wrote: > Andrew Koenig : > > Is a file a container or not? > > I would say no, a file object is not a container in Python terms. > You can't index it with [] or use len() on it or any of > the other things you expect to be able to do on containers. > > I think we just have to live with the idea that there are > things other than containers that can supply iterators. Yes, there are such things, and there may be cases in which no other alternative makes sense. But I don't think files are necessarily in such a bind. > Forcing everything that can supply an iterator to bend > over backwards to try to be a random-access container > as well would be too cumbersome. Absolutely. But what Oren's patch does, and my mods of it preserve, is definitely NOT "forcing" files "to be random- access containers": on the contrary, it accepts the fact that files aren't containers and conceptually simplifies things by making them iterators instead. I'm not sure about "random access" being needed to be a container. Consider sets, e.g. as per Greg Wilson's soapbox implementation (as modified by my patch to allow immutable-sets, maybe, but that's secondary). They're doubtlessly containers, able to produce on request as many iterators as you wish, each iterator not affecting the set's state in any way -- the ideal. But what sense would it make to force sets to expose a __getitem__? Right now they inherit from dict and thus do happen to expose it, but that's really an implementation artefact showing through (and a good example of why one might like to inherit without needing to expose all of the superclass's interface, to tie this in to another recent thread -- inheritance for implementation). Ideally, sets would expose __contains__, __iter__, __len__, ways to add and remove elements, and perhaps (it's so in Greg's implementation, and I didn't touch that) set ops such as union, intersection &c. someset[anindex] is really a weird thing to have... yet sets _are_ containers! Alex From guido@python.org Thu Jul 18 19:42:30 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 14:42:30 -0400 Subject: [Python-Dev] Fw: Behavior of buffer() In-Reply-To: Your message of "Mon, 15 Jul 2002 10:52:56 PDT." <20020715175256.5971.qmail@web40112.mail.yahoo.com> References: <20020715175256.5971.qmail@web40112.mail.yahoo.com> Message-ID: <200207181842.g6IIgUo22271@odiug.zope.com> > Yes. The PyBufferObject grabs the pointer from the PyBufferProcs > supporting object when the PyBufferObject is created. If the PyBufferProcs > supporting object reallocates the memory (possibly from a resize) the > PyBufferObject can be left with a bad pointer. This is easily possible if > you try to use the array module arrays as a buffer. > > I've submitted a patch to fix this particular problem (among others), but > there are still enough things that the buffer object can't do that > something new is needed. Can you remind me of the patch#? (I'm curious how you plan to fix this...) > In the PEP that I'm drafting, I've been calling the new object "bytes" > (since it is just a simple array of bytes). Now that you guys are > referring to it as the "memory object", should I change the name? Doesn't > really matter, but it might avoid confusion to know we're all talking about > the same thing. I like bytes just fine. PS, Todd, if you can, please don't send HTML-only mail to python-dev... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 19:49:19 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 14:49:19 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 08:02:23 +0200." References: <200207172348.g6HNmEB23863@oma.cosc.canterbury.ac.nz> Message-ID: <200207181849.g6IInJa22327@odiug.zope.com> > > Would making them cooperate with GC be a difficult > > thing to do? Seems to me we should be moving towards > > making everything cooperate with GC, and fixing > > things like this whenever they come to light. > > Tim Peters says it wouldn't be, but I have not explored that. But he also warned that it introduces new surprises. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 19:45:41 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 14:45:41 -0400 Subject: [Python-Dev] Re: Sets In-Reply-To: Your message of "Thu, 18 Jul 2002 07:52:34 +0200." References: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> Message-ID: <200207181845.g6IIjfw22307@odiug.zope.com> > But what sense would it make to force sets to expose > a __getitem__? Right now they inherit from dict and > thus do happen to expose it, but that's really an > implementation artefact showing through (and a good > example of why one might like to inherit without needing > to expose all of the superclass's interface, to tie this in > to another recent thread -- inheritance for implementation). > > Ideally, sets would expose __contains__, __iter__, __len__, > ways to add and remove elements, and perhaps (it's so in > Greg's implementation, and I didn't touch that) set ops such > as union, intersection &c. someset[anindex] is really a weird > thing to have... yet sets _are_ containers! I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax@aleax.it Thu Jul 18 06:57:37 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 07:57:37 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207172332.g6HNWMp23835@oma.cosc.canterbury.ac.nz> References: <200207172332.g6HNWMp23835@oma.cosc.canterbury.ac.nz> Message-ID: On Thursday 18 July 2002 01:32 am, Greg Ewing wrote: > Alex Martelli : > > All files have seek and write, but not on all files do they work -- and > > the same goes for iteration. I.e., it IS something of a mess > > I've just had a thought. Maybe it would be less of a mess > if what we are calling "iterators" had been called "streams" Possibly -- I did use the "streams" name often in the tutorial on iterators and generators, it's a very natural term. > instead. Then the term "iterator" could have been reserved > for the special case of an object that provides stream > access to a random-access collection. Nice touch, except that I keep quibbling on the "random access" need -- see my previous msg about sets. > Then you could say that a file object is a stream object That's what I'd love to do -- and requires the file object to expose a next method and have iter(f) is f. That's what Oren's patch does, and the reason I'm trying to save it from the need for a reference loop. > that provides line-by-line access to an OS file. Other > stream objects can be constructed that give access to > the OS file in other units. That would all make sense > without seeming to imply any multi-pass ability. Seekable files can be multi-pass, but in the strict sense that you can rewind them -- it's still impractical to have them produce multiple *independent* iterators (needing some sort of in-memory caching). Alex From jeremy@alum.mit.edu Thu Jul 18 20:08:16 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 15:08:16 -0400 Subject: [Python-Dev] configure problems porting to Tru64 Message-ID: <15671.4640.361811.434411@slothrop.zope.com> I've been trying to build with the current CVS on Tru64 today. This is Tru64 Unix 5.1a with Compaq C++ 6.5. I've run into a bunch of problems with posixmodule.c (not surprise there), but I don't know what the right strategy for fixing them is. Here is a conflicting set of problems: fchdir() is only defined if _XOPEN_SOURCE_EXTENDED is defined. setpgrp() takes no arguments if _XOPEN_SOURCE_EXTENDED is defined, but two arguments if it is not. I found the fchdir() problem first and though the solution would be to change this bit of code in Python.h: /* Forcing SUSv2 compatibility still produces problems on some platforms, True64 and SGI IRIX being two of them, so for now the define is switched off. */ #if 0 #ifndef _XOPEN_SOURCE # define _XOPEN_SOURCE 500 #endif #endif And change "#if 0" to "#if __digital__", but that causes the setpgrp() problem to appear. It seems that configure has a test for whether setpgrp() takes arguments, but configure runs its test without defining _XOPEN_SOURCE. (I'll also note that configure.in has a rather complex test for this, when it appears that autoconf has a builtin AC_FUNC_SETPGRP. Anyone know why we don't use this?) How should we actually fix this problem? It seems to me that the right solution is to define _XOPEN_SOURCE in Tru64 and somehow guarantee that configure runs its tests with that defined, too. How would we achieve that? Jeremy From mal@lemburg.com Thu Jul 18 20:13:37 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 21:13:37 +0200 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects dictobject.c,2.127,2.128 floatobject.c,2.113,2.114 intobject.c,2.84,2.85 listobject.c,2.120,2.121 longobject.c,1.119,1.120 rangeobject.c,2.42,2.43 stringobject.c,2.169,2.170 tupleobject.c,2.69,2.70 typeobject.c,2.160,2.161 unicodeobject.c,2.155,2.156 xxobject.c,2.20,2.21 References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> Message-ID: <3D371361.7050908@lemburg.com> Jeremy Hylton wrote: >>>>>>"MAL" == mal writes: >>>>> > > MAL> M.-A. Lemburg wrote: > >>> I suggest that we keep Jeremy's checkins in 2.3. Hopefully > >>> during the alpha or beta release cycle we will find out if there > >>> *really* are still platforms with broken compilers. At worst, > >>> it will show up after 2.3 final is released, and then we can fix > >>> it in 2.3.1. You won't have to target mx for 2.3 for another 18 > >>> months (assuming the PBF ever releases Python-in-a-Tie). > >> > >> > >> It's easy enough for me to add the #defines to the support header > >> file if you take it out of the distribution, so it wouldn't hurt. > > MAL> Just an addition: please leave the configure test in the > MAL> distribution. While I could implement that using distutils as > MAL> well, I would rather benefit from relying on config.h doing the > MAL> right thing in case there are some newly broken compilers out > MAL> there, e.g. the xlC one on AIX seems to be a very picky one... > > I don't understand what your goal is. Why do you want the configure > test if your header file has a bunch of platform-specific ifdefs? If > these platforms actually had a problem, the configure test would have > caught it and you wouldn't need the ifdefs. The only way the ifdefs > would have an effect is if the configure test did not detect a > problem; but if the configure test didn't detect a problem, then you > don't need the ifdefs. Correct, but I don't want to add more cruft to the file: The configure script tests whether static forwards work or not. If you'd rip out the test as well, then I'd have to add those platforms which still have problems manually. The problem is: I don't know which platforms these are (because configure found these itself). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From greg@cosc.canterbury.ac.nz Thu Jul 18 11:03:47 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Jul 2002 22:03:47 +1200 (NZST) Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? Message-ID: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> Someone told me that Pyrex should be generating __declspec(dllexport) for the module init func. But someone else says this is only needed if you're importing a dll as a library, and that it's not needed for Python extensions. Can anyone who knows what they're doing on Windows give me a definitive answer about whether it's really needed or not? Thanks, Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From aleax@aleax.it Thu Jul 18 07:01:54 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 08:01:54 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207172339.g6HNd5j23845@oma.cosc.canterbury.ac.nz> References: <200207172339.g6HNd5j23845@oma.cosc.canterbury.ac.nz> Message-ID: On Thursday 18 July 2002 01:39 am, Greg Ewing wrote: > Alex Martelli : > > the file object's is the only example of "fat interface" problem > > in Python -- an interface that exposes a lot of methods, with many > > objects claiming they implement that interface but actually lying > > Maybe the existing file object should be split up into > some number of other objects with smaller interfaces. In an ideal world, yes. In practice, I strongly doubt it's feasible to break backwards compatibility THAT heavily. > For example, instead of the file object actually accessing an > OS file itself, it could just be a wrapper around an > underlying "bytestream" object, which implements only > read() and write(). I suspect read and write would best be kept on separate interfaces. Ability to read, write, seek-and-tell, being three atoms of which it makes sense to have about 6 combos (R, W, R+W, each with or without S&T). Rewind might make sense separately from S&T if streaming tapes were still in fashion and OS's gave natural access to them. But I do think it's all pretty academic. Alex From mal@lemburg.com Thu Jul 18 20:19:21 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 21:19:21 +0200 Subject: [Python-Dev] Re: Patch level versions and new features (Was: Some dull gc stats) References: <3D220A86.5070003@lemburg.com> <3D22ADD9.1030901@lemburg.com> <15650.64375.162977.160780@anthem.wooz.org> <3D2433B9.9080102@lemburg.com> <15657.39558.325764.651122@anthem.wooz.org> <3D299E42.70200@lemburg.com> <20020709012056.GA2526@cthulhu.gerg.ca> <200207181549.g6IFniw21368@odiug.zope.com> Message-ID: <3D3714B9.1060807@lemburg.com> Guido van Rossum wrote: >>>Perhaps we could have some kind of category for distutils >>>packages which marks them as system add-ons vs. site add-ons. >> >>+1 -- this should definitely be up to the package author/packager, not >>the local admin. I once tried to convince Guido that the ability to >>occasionally upgrade standard library modules/packages would be a good >>thing, but he wasn't having it. Any change of heart, O Mighty BDFL? > > > Before I answer that, here's a question. Why do we think it's a good > idea to distribute upgrades as separate add-ons while we don't think > it's okay to distribute such upgrades with bugfix releases? The idea is to provide bugfixes for Python versions which are no longer being maintained. Of course, the effect would only show a few years ahead. > Doesn't > this just increase the variability of site configurations, and hence > version interaction hell? I don't think that core packages are any different than other third party packages: they are usually independent enough from the rest of the code that upgrades don't affect the workings of the other code using it. The internals are free to change, though, e.g. to accomodate bug fixes, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From xscottg@yahoo.com Thu Jul 18 20:24:50 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 18 Jul 2002 12:24:50 -0700 (PDT) Subject: [Python-Dev] Fw: Behavior of buffer() In-Reply-To: <200207181842.g6IIgUo22271@odiug.zope.com> Message-ID: <20020718192450.15024.qmail@web40105.mail.yahoo.com> --- Guido van Rossum wrote: > > Yes. The PyBufferObject grabs the pointer from the PyBufferProcs > > supporting object when the PyBufferObject is created. If the > > PyBufferProcs supporting object reallocates the memory (possibly > > from a resize) the PyBufferObject can be left with a bad pointer. > > This is easily possible if you try to use the array module arrays > > as a buffer. > > > > I've submitted a patch to fix this particular problem (among others), > > but there are still enough things that the buffer object can't do that > > something new is needed. > > Can you remind me of the patch#? (I'm curious how you plan to fix > this...) > Patch number 552438. Instead of cacheing the pointer, it grabs it from the other object every time it is needed. Might be a little slower, but I think it's correct. > Barry (the PEP czar) forwarded me your PEP. I'll try to do some > triage on it so I can tell Barry whether to check it in (that doesn't > mean it's accepted :-). I'm bad at patience, but I'm not terribly naive. I fully expect everyone and their dog will find something to dislike before it gets approved/rejected. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Autos - Get free new car price quotes http://autos.yahoo.com From haering_python@gmx.de Thu Jul 18 20:28:51 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Thu, 18 Jul 2002 21:28:51 +0200 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> References: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> Message-ID: <20020718192851.GA2759@lilith.my-fqdn.de> * Greg Ewing [2002-07-18 22:03 +1200]: > Someone told me that Pyrex should be generating > __declspec(dllexport) for the module init func. That's wrong. You should be using DL_EXPORT instead, which will do the right thing no matter which platform you're on: on Windows, it will expand to __declspec(dllexport), iff you're compiling an extension module (in contrast to compiling the Python core). I believe that on Unix, it will expand to an empty string :-) You also don't need any #ifdefs for win32 for setting ob_type, just set them _only_ in your init function and leave them as NULL in the declarations. Gerhard -- This sig powered by Python! Außentemperatur in München: 14.3 °C Wind: 1.9 m/s From guido@python.org Thu Jul 18 20:30:41 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 15:30:41 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 08:01:54 +0200." References: <200207172339.g6HNd5j23845@oma.cosc.canterbury.ac.nz> Message-ID: <200207181930.g6IJUfX22643@odiug.zope.com> > I suspect read and write would best be kept on separate > interfaces. Ability to read, write, seek-and-tell, being three > atoms of which it makes sense to have about 6 combos > (R, W, R+W, each with or without S&T). Rewind might > make sense separately from S&T if streaming tapes were still in > fashion and OS's gave natural access to them. 5, because R+W without S&T makes little sense. > But I do think it's all pretty academic. C++ has tried very hard to do this with its istream, ostream and iostream classes; I believe I heard C++ people say once that it's not considered a success. I believe Java has tried to address this too. What do you think of Java's solution? --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@zesty.ca Thu Jul 18 20:31:45 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Thu, 18 Jul 2002 12:31:45 -0700 (PDT) Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <20020718100631.A3468@doublegemini.com> Message-ID: On Thu, 18 Jul 2002, Clark C . Evans wrote: > On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: > | But i think this is more than a minor problem. This is a > | namespace collision problem, and that's significant. Naming > | the method "next" means that any object with a "next" method > | cannot be adapted to support the iterator protocol. Unfortunately > | "next" is a pretty common word and it's quite possible that such > | a method name is already in use. > > Ping, > > Do you have any suggestions for re-wording the Iterator questionare > at http://yaml.org/wk/survey?id=pyiter to reflect this paragraph above? I might add something like: One motivation for this change is that the name "next()" might collide with the name of an existing "next()" method. This could cause a problem if someone wants to implement the iterator protocol for an object that already happens to have a method called "next()". So far no one has reported encountering this situation. It seems plausible that there will be some objects where it would be nice to support the iterator protocol, and we have heard of some objects with methods named "next()", but we don't know how likely or unlikely it is that there's an object where both are true. Does that seem fair? -- ?!ng From jeremy@alum.mit.edu Thu Jul 18 20:32:14 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 15:32:14 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects dictobject.c,2.127,2.128 floatobject.c,2.113,2.114 intobject.c,2.84,2.85 listobject.c,2.120,2.121 longobject.c,1.119,1.120 rangeobject.c,2.42,2.43 stringobject.c,2.169,2.170 tupleobject.c,2.69,2.70 typeobject.c,2.160,2.161 unicodeobject.c,2.155,2.156 xxobject.c,2.20,2.21 In-Reply-To: <3D371361.7050908@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> Message-ID: <15671.6078.577033.943393@slothrop.zope.com> >>>>> "MAL" == mal writes: MAL> The configure script tests whether static forwards work or MAL> not. If you'd rip out the test as well, then I'd have to add MAL> those platforms which still have problems manually. MAL> The problem is: I don't know which platforms these are (because MAL> configure found these itself). If you think the configure test works, why do you have platform specific ifdefs in your header file? Jeremy From mal@lemburg.com Thu Jul 18 20:35:01 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 21:35:01 +0200 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? References: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> Message-ID: <3D371865.5070908@lemburg.com> Greg Ewing wrote: > Someone told me that Pyrex should be generating > __declspec(dllexport) for the module init func. > But someone else says this is only needed if > you're importing a dll as a library, and that > it's not needed for Python extensions. > > Can anyone who knows what they're doing on > Windows give me a definitive answer about > whether it's really needed or not? You need to export at least the init () API and that is usually done using the dllexport flag. Note that this is only needed for shared modules (DLLs), not modules which are linked statically. This is what I use for this: /* Macro to "mark" a symbol for DLL export */ #if (defined(_MSC_VER) && _MSC_VER > 850 \ || defined(__MINGW32__) || defined(__CYGWIN) || defined(__BEOS__)) # ifdef __cplusplus # define MX_EXPORT(type) extern "C" type __declspec(dllexport) # else # define MX_EXPORT(type) extern type __declspec(dllexport) # endif #elif defined(__WATCOMC__) # define MX_EXPORT(type) extern type __export #elif defined(__IBMC__) # define MX_EXPORT(type) extern type _Export #else # define MX_EXPORT(type) extern type #endif -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From tim@zope.com Thu Jul 18 20:34:58 2002 From: tim@zope.com (Tim Peters) Date: Thu, 18 Jul 2002 15:34:58 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > Someone told me that Pyrex should be generating > __declspec(dllexport) for the module init func. > But someone else says this is only needed if > you're importing a dll as a library, 1. What else could one do with a DLL? That is, in your view is the "importing ... as a library" part not redundant? 2. Does Pyrex compile to DLLs (or PYDs) on Windows? I simply don't know. > and that it's not needed for Python extensions. If an extension is compiled into a DLL/PYD, it must tell the linker which symbols are to be exported. __declspec(dllexport) in the source is one way to do that. Other possibilities include creating a .def file, or specifying exported names on the linker's command line (like "/export:init_sre"). The best thing to do for Windows is ask that Windows users supply patches. Or you could upgrade to Windows yourself . From fredrik@pythonware.com Thu Jul 18 20:37:09 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 18 Jul 2002 21:37:09 +0200 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? References: <200207181003.g6IA3l127038@oma.cosc.canterbury.ac.nz> Message-ID: <034701c22e92$9473dfc0$ced241d5@hagrid> greg wrote: > Someone told me that Pyrex should be generating > __declspec(dllexport) for the module init func. almost; for portability, it's better to use the DL_EXPORT provided by Python.h: DL_EXPORT(void) init_module(void) { ... } > But someone else says this is only needed if > you're importing a dll as a library, and that > it's not needed for Python extensions. that someone is confused; the dllexport declaration makes sure that the init function is exported from the DLL. if not, Python's PYD loader won't find the init function. From aleax@aleax.it Thu Jul 18 20:38:15 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 21:38:15 +0200 Subject: [Python-Dev] Re: Sets In-Reply-To: <200207181845.g6IIjfw22307@odiug.zope.com> References: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> <200207181845.g6IIjfw22307@odiug.zope.com> Message-ID: <02071821381500.04480@arthur> On Thursday 18 July 2002 20:45, Guido van Rossum wrote: ... > I believe I recommended to Greg to make sets "have" a dict instead of > "being" dicts, and I think he agreed. But I guess he never got to > implementing that change. Right. OK, guess I'll make a new patch using delegation instead of inheritance, then. Alex From guido@python.org Thu Jul 18 20:50:39 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 15:50:39 -0400 Subject: [Python-Dev] Re: Sets In-Reply-To: Your message of "Thu, 18 Jul 2002 21:38:15 +0200." <02071821381500.04480@arthur> References: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> <200207181845.g6IIjfw22307@odiug.zope.com> <02071821381500.04480@arthur> Message-ID: <200207181950.g6IJodg22778@odiug.zope.com> > > I believe I recommended to Greg to make sets "have" a dict instead of > > "being" dicts, and I think he agreed. But I guess he never got to > > implementing that change. > > Right. OK, guess I'll make a new patch using delegation instead > of inheritance, then. Maybe benchmark the performance too. If the "has" version is much slower, perhaps we could remove unwanted interfaces from the public API by overriding them with something that raises an exception (and rename the internal versions to some internal name if they are needed). --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@zesty.ca Thu Jul 18 20:59:01 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Thu, 18 Jul 2002 12:59:01 -0700 (PDT) Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <200207172121.g6HLLQH13946@odiug.zope.com> Message-ID: I wrote: > __iter__ is a red herring. [...blah blah blah...] > Iterators are currently asked to support both protocols. The > semantics of iteration come only from protocol 2; protocol 1 is > an effort to make iterators look sorta like sequences. But the > analogy is very weak -- these are "sequences" that destroy > themselves while you look at them -- not like any typical > sequence i've ever seen! > > The short of it is that whenever any Python programmer says > "for x in y", he or she had better be darned sure of whether > this is going to destroy y. Whatever we can do to make this > clear would be a good idea. On Wed, 17 Jul 2002, Guido van Rossum wrote: > This is a very good summary of the two iterator protocols. Ping, > would you mind adding this to PEP 234? I have now done so. I didn't add the whole thing verbatim, because the tone doesn't fit: it was written with the intent of motivating a change to the protocol, rather than describing what the protocol is. Presumably we don't want the PEP to say "__iter__ is a red herring". There's a bunch of issues flying around here, which i'll try to explain better in a separate posting. But i wanted to take care of Guido's request first. I have toned down and abridged my text somewhat, and strengthened the requirement for __iter__(). Here is what the "API specification" section now says: Classes can define how they are iterated over by defining an __iter__() method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self. The two methods correspond to two distinct protocols: 1. An object can be iterated over with "for" if it implements __iter__() or __getitem__(). 2. An object can function as an iterator if it implements next(). Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences. But the analogy is weak -- unlike ordinary sequences, iterators are "sequences" that are destroyed by the act of looking at their elements. Consequently, whenever any Python programmer says "for x in y", he or she must be sure of whether this is going to destroy y. -- ?!ng From guido@python.org Thu Jul 18 20:58:50 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 15:58:50 -0400 Subject: [Python-Dev] Re: Patch level versions and new features (Was: Some dull gc stats) In-Reply-To: Your message of "Thu, 18 Jul 2002 21:19:21 +0200." <3D3714B9.1060807@lemburg.com> References: <3D220A86.5070003@lemburg.com> <3D22ADD9.1030901@lemburg.com> <15650.64375.162977.160780@anthem.wooz.org> <3D2433B9.9080102@lemburg.com> <15657.39558.325764.651122@anthem.wooz.org> <3D299E42.70200@lemburg.com> <20020709012056.GA2526@cthulhu.gerg.ca> <200207181549.g6IFniw21368@odiug.zope.com> <3D3714B9.1060807@lemburg.com> Message-ID: <200207181958.g6IJwoY22816@odiug.zope.com> > Guido van Rossum wrote: > >>>Perhaps we could have some kind of category for distutils > >>>packages which marks them as system add-ons vs. site add-ons. > >> > >>+1 -- this should definitely be up to the package author/packager, not > >>the local admin. I once tried to convince Guido that the ability to > >>occasionally upgrade standard library modules/packages would be a good > >>thing, but he wasn't having it. Any change of heart, O Mighty BDFL? > > > > > > Before I answer that, here's a question. Why do we think it's a good > > idea to distribute upgrades as separate add-ons while we don't think > > it's okay to distribute such upgrades with bugfix releases? [MAL] > The idea is to provide bugfixes for Python versions which are > no longer being maintained. Of course, the effect would only > show a few years ahead. Hm, if you really are fixing bugs in old versions, why not patch the Python installation in-place rather than trying to play nice? > > Doesn't > > this just increase the variability of site configurations, and hence > > version interaction hell? > > I don't think that core packages are any different than > other third party packages: they are usually independent > enough from the rest of the code that upgrades don't affect > the workings of the other code using it. The internals are > free to change, though, e.g. to accomodate bug fixes, etc. Well, I don't expect that we'll do independent upgrades for core packages, so I propose to end this thread. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 21:08:54 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 16:08:54 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 12:59:01 PDT." References: Message-ID: <200207182008.g6IK8tb22853@odiug.zope.com> > I didn't add the whole thing verbatim, because the tone doesn't fit: > it was written with the intent of motivating a change to the > protocol, rather than describing what the protocol is. Presumably > we don't want the PEP to say "__iter__ is a red herring". > > There's a bunch of issues flying around here, which i'll try to > explain better in a separate posting. But i wanted to take care > of Guido's request first. I have toned down and abridged my text > somewhat, and strengthened the requirement for __iter__(). Here > is what the "API specification" section now says: > > Classes can define how they are iterated over by defining an > __iter__() method; this should take no additional arguments and > return a valid iterator object. A class that wants to be an > iterator should implement two methods: a next() method that behaves > as described above, and an __iter__() method that returns self. > > The two methods correspond to two distinct protocols: > > 1. An object can be iterated over with "for" if it implements > __iter__() or __getitem__(). > > 2. An object can function as an iterator if it implements next(). > > Container-like objects usually support protocol 1. Iterators are > currently required to support both protocols. The semantics of > iteration come only from protocol 2; protocol 1 is present to make > iterators behave like sequences. But the analogy is weak -- unlike > ordinary sequences, iterators are "sequences" that are destroyed > by the act of looking at their elements. Find up to here. > Consequently, whenever any Python programmer says "for x in y", > he or she must be sure of whether this is going to destroy y. I don't understand why this is here. *Why* is it important to know whether this is going to destroy y? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 18 21:42:02 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 16:42:02 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 07:23:16 EDT." References: <200207180043.g6I0hKB25427@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207182042.g6IKg2n22947@odiug.zope.com> > Maybe the reasons behind having __iter__() returning itself should be > clearly expressed in the PEP, too. On this list, Tim gave one recently, > Guido gives another here, but unless I missed it, the PEP gives none. > Usually, PEPs explain the reasons behind the choices. Ping added this to the PEP: The two methods correspond to two distinct protocols: 1. An object can be iterated over with "for" if it implements __iter__() or __getitem__(). 2. An object can function as an iterator if it implements next(). Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences. But the analogy is weak -- unlike ordinary sequences, iterators are "sequences" that are destroyed by the act of looking at their elements. (I could do without the last sentence, since this expresses a value judgement rather than fact -- not a good thing to have in a PEP's "specification" section.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Jul 18 21:50:31 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 22:50:31 +0200 Subject: [Python-Dev] Re: Patch level versions and new features (Was: Some dull gc stats) References: <3D220A86.5070003@lemburg.com> <3D22ADD9.1030901@lemburg.com> <15650.64375.162977.160780@anthem.wooz.org> <3D2433B9.9080102@lemburg.com> <15657.39558.325764.651122@anthem.wooz.org> <3D299E42.70200@lemburg.com> <20020709012056.GA2526@cthulhu.gerg.ca> <200207181549.g6IFniw21368@odiug.zope.com> <3D3714B9.1060807@lemburg.com> <200207181958.g6IJwoY22816@odiug.zope.com> Message-ID: <3D372A17.50509@lemburg.com> Guido van Rossum wrote: >>Guido van Rossum wrote: >> >>>>>Perhaps we could have some kind of category for distutils >>>>>packages which marks them as system add-ons vs. site add-ons. >>>> >>>>+1 -- this should definitely be up to the package author/packager, not >>>>the local admin. I once tried to convince Guido that the ability to >>>>occasionally upgrade standard library modules/packages would be a good >>>>thing, but he wasn't having it. Any change of heart, O Mighty BDFL? >>> >>> >>>Before I answer that, here's a question. Why do we think it's a good >>>idea to distribute upgrades as separate add-ons while we don't think >>>it's okay to distribute such upgrades with bugfix releases? >> > > [MAL] > >>The idea is to provide bugfixes for Python versions which are >>no longer being maintained. Of course, the effect would only >>show a few years ahead. > > > Hm, if you really are fixing bugs in old versions, why not patch the > Python installation in-place rather than trying to play nice? We don't have an easy way of doing this, unless of course we trick python setup.py install to install directly into .../lib/pythonX.X rather than a sub directory on the path. >>>Doesn't >>>this just increase the variability of site configurations, and hence >>>version interaction hell? >> >>I don't think that core packages are any different than >>other third party packages: they are usually independent >>enough from the rest of the code that upgrades don't affect >>the workings of the other code using it. The internals are >>free to change, though, e.g. to accomodate bug fixes, etc. > > Well, I don't expect that we'll do independent upgrades for core > packages, so I propose to end this thread. Barry is already doing this with the email package and I would expect more such packages to make their way into the core. The PyXML package also has a life of its own outside the core distribution and could benefit from this. I think it's too early to end the thread. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Thu Jul 18 20:21:59 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 15:21:59 -0400 Subject: [Python-Dev] configure problems porting to Tru64 In-Reply-To: Your message of "Thu, 18 Jul 2002 15:08:16 EDT." <15671.4640.361811.434411@slothrop.zope.com> References: <15671.4640.361811.434411@slothrop.zope.com> Message-ID: <200207181922.g6IJM0O22574@odiug.zope.com> > (I'll also note that configure.in has a rather complex test for this, > when it appears that autoconf has a builtin AC_FUNC_SETPGRP. Anyone > know why we don't use this?) I'll bet AC_FUNC_SETPGRP didn't exist in the autoconf version we were using when we wrote that test. Feel free to fix it. BTW, the snake farm build for AIX-2-000000042E00-hal now fails like this: ../python/dist/src/Modules/posixmodule.c: In function `posix_fdatasync': ../python/dist/src/Modules/posixmodule.c:902: `fdatasync' undeclared (first use this function) ../python/dist/src/Modules/posixmodule.c:902: (Each undeclared identifier is reported only once ../python/dist/src/Modules/posixmodule.c:902: for each function it appears in.) --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax@aleax.it Thu Jul 18 21:52:31 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 22:52:31 +0200 Subject: [Python-Dev] Re: Sets In-Reply-To: <200207181950.g6IJodg22778@odiug.zope.com> References: <200207172325.g6HNPQM23808@oma.cosc.canterbury.ac.nz> <02071821381500.04480@arthur> <200207181950.g6IJodg22778@odiug.zope.com> Message-ID: On Thursday 18 July 2002 09:50 pm, Guido van Rossum wrote: > > > I believe I recommended to Greg to make sets "have" a dict instead of > > > "being" dicts, and I think he agreed. But I guess he never got to > > > implementing that change. > > > > Right. OK, guess I'll make a new patch using delegation instead > > of inheritance, then. > > Maybe benchmark the performance too. If the "has" version is much > slower, perhaps we could remove unwanted interfaces from the public > API by overriding them with something that raises an exception (and > rename the internal versions to some internal name if they are > needed). I've just updated patch 580995 with the has-A rather than is-A version. OK, I'll now run some simple benchmarks... Looks good, offhand. Here's the simple benchmark script: import time import set import sys clock = time.clock raw = range(10000) times = [None]*20 print "Timing Set %s (Python %s)" % (set.__version__, sys.version) print "Make 20 10k-items sets (no reps)...", start = clock() for i in times: s10k = set.Set(raw) stend = clock() print stend-start witre = range(1000)*10 print "Make 20 1k-items sets (x10 reps)...", for i in times: s1k1 = set.Set(witre) stend = clock() print stend-start raw1 = range(500, 1500) print "Make 20 more 1k-items sets (no reps)...", for i in times: s1k2 = set.Set(raw1) stend = clock() print stend-start print "20 unions of 1k-items sets 50% overlap...", for i in times: result = s1k1 | s1k2 stend = clock() print stend-start print "20 inters of 1k-items sets 50% overlap...", for i in times: result = s1k1 & s1k2 stend = clock() print stend-start print "20 diffes of 1k-items sets 50% overlap...", for i in times: result = s1k1 - s1k2 stend = clock() print stend-start print "20 simdif of 1k-items sets 50% overlap...", for i in times: result = s1k1 ^ s1k2 stend = clock() print stend-start And here's a few runs (with -O of course) on my PC: [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.43 20 inters of 1k-items sets 50% overlap... 0.92 20 diffes of 1k-items sets 50% overlap... 1.41 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.39 [alex@lancelot has]$ cd ../is [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.38 Make 20 more 1k-items sets (no reps)... 0.4 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.41 [alex@lancelot is]$ They look much of a muchness to me. Sorry about the version stuck at 1.5 -- forgot to update that, but you can tell the difference by the directory name, 'is' and 'has' resp.:-). Python 2.3 (built from CVS 22 hours ago) is substantially faster at some tasks (intersections and differences): [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.74 [alex@lancelot has]$ [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.35 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.41 20 inters of 1k-items sets 50% overlap... 0.74 20 diffes of 1k-items sets 50% overlap... 1.07 20 simdif of 1k-items sets 50% overlap... 1.72 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot is]$ but as you can see, again it's uniformly faster on both 'is' and 'has' versions of sets. The 'has' version thus seems preferable here. Alex From jeremy@alum.mit.edu Thu Jul 18 20:10:22 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 15:10:22 -0400 Subject: [Python-Dev] staticforward In-Reply-To: <15670.62365.517118.775364@slothrop.zope.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <15670.62365.517118.775364@slothrop.zope.com> Message-ID: <15671.4766.961501.277589@slothrop.zope.com> FWIW I confirm today that staticforward is not needed Tru64 5.1. Jerem From guido@python.org Thu Jul 18 20:18:38 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 15:18:38 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 07:57:37 +0200." References: <200207172332.g6HNWMp23835@oma.cosc.canterbury.ac.nz> Message-ID: <200207181918.g6IJIcW22539@odiug.zope.com> > > I've just had a thought. Maybe it would be less of a mess > > if what we are calling "iterators" had been called "streams" > > Possibly -- I did use the "streams" name often in the tutorial > on iterators and generators, it's a very natural term. OTOH in C++ and Java, "stream" refers to an open file object (to emphasize the iteratorish feeling of a file opened for sequential reading or writing, as opposed to the concept of a file as a random-access array of bytes on disk). > Seekable files can be multi-pass, but in the strict sense > that you can rewind them -- it's still impractical to have > them produce multiple *independent* iterators (needing > some sort of in-memory caching). It would be trivial if you had an object representing the notion of a file on disk rather than an open file. Each iterator would be implemented as a separate open file referring to the same filename. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Thu Jul 18 22:00:05 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 17:00:05 -0400 Subject: [Python-Dev] configure problems porting to Tru64 In-Reply-To: <200207181922.g6IJM0O22574@odiug.zope.com> References: <15671.4640.361811.434411@slothrop.zope.com> <200207181922.g6IJM0O22574@odiug.zope.com> Message-ID: <15671.11349.924113.246257@slothrop.zope.com> >>>>> "GvR" == Guido van Rossum writes: GvR> BTW, the snake farm build for AIX-2-000000042E00-hal now fails GvR> like this: GvR> ../python/dist/src/Modules/posixmodule.c: In function GvR> `posix_fdatasync': GvR> ../python/dist/src/Modules/posixmodule.c:902: `fdatasync' GvR> undeclared (first use this function) GvR> ../python/dist/src/Modules/posixmodule.c:902: (Each undeclared GvR> identifier is reported only once GvR> ../python/dist/src/Modules/posixmodule.c:902: for each function GvR> it appears in.) (I already mentioned this to Guido, but) This problem has been occuring on AIX for a while. It's unrelated to staticforward. So we've now confirmed that staticforward is unneeded on AIX and Tru64. Perhaps MAL would like to find an SCO ODT compiler to try it out with. Jeremy From mal@lemburg.com Thu Jul 18 22:07:41 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 23:07:41 +0200 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects dictobject.c,2.127,2.128 floatobject.c,2.113,2.114 intobject.c,2.84,2.85 listobject.c,2.120,2.121 longobject.c,1.119,1.120 rangeobject.c,2.42,2.43 stringobject.c,2.169,2.170 tupleobject.c,2.69,2.70 typeobject.c,2.160,2.161 unicodeobject.c,2.155,2.156 xxobject.c,2.20,2.21 References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> Message-ID: <3D372E1D.50009@lemburg.com> Jeremy Hylton wrote: >>>>>>"MAL" == mal writes: >>>>> > > MAL> The configure script tests whether static forwards work or > MAL> not. If you'd rip out the test as well, then I'd have to add > MAL> those platforms which still have problems manually. > > MAL> The problem is: I don't know which platforms these are (because > MAL> configure found these itself). > > If you think the configure test works, why do you have platform > specific ifdefs in your header file? Because it doesn't always work :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From martin@v.loewis.de Thu Jul 18 22:09:34 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 18 Jul 2002 23:09:34 +0200 Subject: [Python-Dev] Re: configure problems porting to Tru64 In-Reply-To: <15671.4640.361811.434411@slothrop.zope.com> References: <15671.4640.361811.434411@slothrop.zope.com> Message-ID: Jeremy Hylton writes: > (I'll also note that configure.in has a rather complex test for this, > when it appears that autoconf has a builtin AC_FUNC_SETPGRP. Anyone > know why we don't use this?) That test was introduced in configure.in 1.9, on 1994/11/03. It might well be that autoconf did not support that test at that time. > How should we actually fix this problem? It seems to me that the > right solution is to define _XOPEN_SOURCE in Tru64 and somehow > guarantee that configure runs its tests with that defined, too. How > would we achieve that? I think it is generally the right thing to define _XOPEN_SOURCE on Unix, providing a negative list of systems that cannot support this setting (or preferably solving whatever problems remain). I'd put an (unconditional) AC_DEFINE into configure.in early on; it *should* go into confdefs.h as configure proceeds, and thus be active when other tests are performed. Regards, Martin From aleax@aleax.it Thu Jul 18 22:12:11 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 23:12:11 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207181930.g6IJUfX22643@odiug.zope.com> References: <200207172339.g6HNd5j23845@oma.cosc.canterbury.ac.nz> <200207181930.g6IJUfX22643@odiug.zope.com> Message-ID: On Thursday 18 July 2002 09:30 pm, Guido van Rossum wrote: > > I suspect read and write would best be kept on separate > > interfaces. Ability to read, write, seek-and-tell, being three > > atoms of which it makes sense to have about 6 combos > > (R, W, R+W, each with or without S&T). Rewind might > > make sense separately from S&T if streaming tapes were still in > > fashion and OS's gave natural access to them. > > 5, because R+W without S&T makes little sense. Reasonably little, yes -- hard to make up a non-contrived example ('preserve data up to the first occurrence of "bzz" and then overwrite the rest of the file with "spam"'...?-). > > But I do think it's all pretty academic. > > C++ has tried very hard to do this with its istream, ostream and > iostream classes; I believe I heard C++ people say once that it's not > considered a success. As a C++ person I agree. It's better by far than C, mind you -- for text I/O, at least -- but it's complex and intricate. > I believe Java has tried to address this too. > What do you think of Java's solution? In the only time in my life when I was using Java in earnest (in code intended for production purposes, though think3 later dropped the idea), Java hit me with a deprecation to the solar plexus exactly in this area, forcing me to do much unproductive rewriting -- so I find it hard to be unbiased. But even striving to be fair, I don't see the advantage compared e.g. to C++'s streams. Alex From jeremy@alum.mit.edu Thu Jul 18 22:16:09 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 17:16:09 -0400 Subject: [Python-Dev] staticforward In-Reply-To: <3D372E1D.50009@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> Message-ID: <15671.12313.725886.680036@slothrop.zope.com> >>>>> "MAL" == mal writes: MAL> The configure script tests whether static forwards work or MAL> not. If you'd rip out the test as well, then I'd have to add MAL> those platforms which still have problems manually. MAL> The problem is: I don't know which platforms these are (because MAL> configure found these itself). >> >> If you think the configure test works, why do you have platform >> specific ifdefs in your header file? MAL> Because it doesn't always work :-) Let's make sure I've got this straight: You believe there are platforms on which staticforward is necessary, because you can not have a tentative definition of a static followed by a definition with an initializer. Yet the configure test of exactly this behavior succeeds. Further, you don't believe the configure test works but you want us to leave it in anyway? Jeremy From aleax@aleax.it Thu Jul 18 22:23:50 2002 From: aleax@aleax.it (Alex Martelli) Date: Thu, 18 Jul 2002 23:23:50 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207181918.g6IJIcW22539@odiug.zope.com> References: <200207172332.g6HNWMp23835@oma.cosc.canterbury.ac.nz> <200207181918.g6IJIcW22539@odiug.zope.com> Message-ID: On Thursday 18 July 2002 09:18 pm, Guido van Rossum wrote: > > > I've just had a thought. Maybe it would be less of a mess > > > if what we are calling "iterators" had been called "streams" > > > > Possibly -- I did use the "streams" name often in the tutorial > > on iterators and generators, it's a very natural term. > > OTOH in C++ and Java, "stream" refers to an open file object (to > emphasize the iteratorish feeling of a file opened for sequential > reading or writing, as opposed to the concept of a file as a > random-access array of bytes on disk). ...and in Unix Sys/V, if I recall correctly, it refered to an allegedly superior way to do things BSD did with sockets (and more). Any nice-looking term will be complicatedly overloaded by now. I think "seborrea" is still free, though (according to some old Dilbert strips, at least). > > Seekable files can be multi-pass, but in the strict sense > > that you can rewind them -- it's still impractical to have > > them produce multiple *independent* iterators (needing > > some sort of in-memory caching). > > It would be trivial if you had an object representing the notion of a > file on disk rather than an open file. Each iterator would be > implemented as a separate open file referring to the same filename. For a *read-only* disk file, yes -- at least on Unix-ish systems, you could also get the same effect with dup2 without even needing any filename around (e.g. on an already-unlinked file). Hmmm, I do think win32 has something like dup2 -- my copy of Richter remained with think3 (it was actually theirs:-), and I do little Windows these days so I haven't bought another, but I'm pretty sure half an hour on MSDN would let me find it. Maybe something can be built around this -- the underlying disk file as the container, dup2 or equivalent to make independent iterators/ streams (as long as nobody's writing the file... but that's not too different from iterating on e.g. a list, where an insert or del would mess things up...). But surely not by sticking with stdio. Which leads us back to my "this is rather academic" statement: don't we need to stick with stdio to support existing extensions which use FILE*'s, anyway? Alex From guido@python.org Thu Jul 18 22:28:03 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 18 Jul 2002 17:28:03 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Thu, 18 Jul 2002 23:23:50 +0200." References: <200207172332.g6HNWMp23835@oma.cosc.canterbury.ac.nz> <200207181918.g6IJIcW22539@odiug.zope.com> Message-ID: <200207182128.g6ILS3u04720@odiug.zope.com> > > > > I've just had a thought. Maybe it would be less of a mess > > > > if what we are calling "iterators" had been called "streams" > > > > > > Possibly -- I did use the "streams" name often in the tutorial > > > on iterators and generators, it's a very natural term. > > > > OTOH in C++ and Java, "stream" refers to an open file object (to > > emphasize the iteratorish feeling of a file opened for sequential > > reading or writing, as opposed to the concept of a file as a > > random-access array of bytes on disk). > > ...and in Unix Sys/V, if I recall correctly, it refered to an allegedly > superior way to do things BSD did with sockets (and more). Any > nice-looking term will be complicatedly overloaded by now. I > think "seborrea" is still free, though (according to some old Dilbert > strips, at least). Bah. I rather like the idea of using "stream" to denote the future rewritten I/O object, so I don't want to use it for iterators. > Which leads us back to my "this is rather academic" statement: > don't we need to stick with stdio to support existing extensions > which use FILE*'s, anyway? We'll need to support the old style files for a long time. But that doesn't mean we can't invent something new that does't use stdio (or perhaps it uses stdio, just doesn't rely on stdio for various features). --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Jul 18 22:38:59 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 18 Jul 2002 23:38:59 +0200 Subject: [Python-Dev] staticforward References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> Message-ID: <3D373573.8070001@lemburg.com> Jeremy Hylton wrote: >>>>>>"MAL" == mal writes: >>>>> > > MAL> The configure script tests whether static forwards work or > MAL> not. If you'd rip out the test as well, then I'd have to add > MAL> those platforms which still have problems manually. > > MAL> The problem is: I don't know which platforms these are (because > MAL> configure found these itself). > >> > >> If you think the configure test works, why do you have platform > >> specific ifdefs in your header file? > > MAL> Because it doesn't always work :-) > > Let's make sure I've got this straight: > > You believe there are platforms on which staticforward is necessary, > because you can not have a tentative definition of a static followed > by a definition with an initializer. Yet the configure test of > exactly this behavior succeeds. Yes. The test doesn't seem to catch the case of having arrays being declared as static forward. If you look in configure.in you'll find that the test code only checks whether struct behave well. > Further, you don't believe the > configure test works but you want us to leave it in anyway? I believe that it works in most cases, but not all of them. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From jeremy@alum.mit.edu Thu Jul 18 23:02:41 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 18:02:41 -0400 Subject: [Python-Dev] staticforward In-Reply-To: <3D373573.8070001@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> Message-ID: <15671.15105.563068.700997@slothrop.zope.com> >>>>> "MAL" == mal writes: MAL> Yes. The test doesn't seem to catch the case of having arrays MAL> being declared as static forward. If you look in configure.in MAL> you'll find that the test code only checks whether struct MAL> behave well. Then you'll be no better off if we leave the test in. I expect you don't actually have a problem. On the off chance that you do, you've already got all the ifdef trickery you need in your own .h file. Jeremy From barry@zope.com Thu Jul 18 23:05:31 2002 From: barry@zope.com (Barry A. Warsaw) Date: Thu, 18 Jul 2002 18:05:31 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability References: <200207180043.g6I0hKB25427@pcp02138704pcs.reston01.va.comcast.net> <200207182042.g6IKg2n22947@odiug.zope.com> Message-ID: <15671.15275.429784.303580@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> Container-like objects usually support protocol 1. Iterators are >> currently required to support both protocols. The semantics of >> iteration come only from protocol 2; protocol 1 is present to make >> iterators behave like sequences. But the analogy is weak -- unlike >> ordinary sequences, iterators are "sequences" that are destroyed by >> the act of looking at their elements. GvR> (I could do without the last sentence, since this expresses a GvR> value judgement rather than fact -- not a good thing to have GvR> in a PEP's "specification" section.) What about: "...sequences. Note that the act of looking at an iterator's elements mutates the iterator." -Barry From tim@zope.com Thu Jul 18 23:26:47 2002 From: tim@zope.com (Tim Peters) Date: Thu, 18 Jul 2002 18:26:47 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <15671.15275.429784.303580@anthem.wooz.org> Message-ID: > What about: > > "...sequences. Note that the act of looking at an iterator's > elements mutates the iterator." That doesn't belong in the spec either -- nothing requires an iterator to have mutable state, let alone to mutate it when next() is called. From mal@lemburg.com Thu Jul 18 23:31:48 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 00:31:48 +0200 Subject: [Python-Dev] staticforward References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> Message-ID: <3D3741D4.8020408@lemburg.com> Jeremy Hylton wrote: >>>>>>"MAL" == mal writes: >>>>> > > MAL> Yes. The test doesn't seem to catch the case of having arrays > MAL> being declared as static forward. If you look in configure.in > MAL> you'll find that the test code only checks whether struct > MAL> behave well. > > Then you'll be no better off if we leave the test in. I expect you > don't actually have a problem. On the off chance that you do, you've > already got all the ifdef trickery you need in your own .h file. Except that I don't know on which other platforms I'd have to enable it... and no, I don't want to go through another two years of user feedback to find out ! What are you after here ? Remove the configure.in test as well ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From jeremy@alum.mit.edu Thu Jul 18 23:32:30 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 18:32:30 -0400 Subject: [Python-Dev] staticforward In-Reply-To: <3D3741D4.8020408@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> Message-ID: <15671.16894.185299.672286@slothrop.zope.com> >>>>> "MAL" == mal writes: MAL> What are you after here ? Remove the configure.in test as well MAL> ? It is already gone. And earlier in this thread, we established that it did you no good, right? You only care about compilers that choke on static array decls with later initialization, and the test doesn't catch that. Jeremy From jeremy@alum.mit.edu Thu Jul 18 23:36:46 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 18:36:46 -0400 Subject: [Python-Dev] Re: configure problems porting to Tru64 In-Reply-To: References: <15671.4640.361811.434411@slothrop.zope.com> Message-ID: <15671.17150.922349.270282@slothrop.zope.com> Thanks. This suggestions gets the compile to succeed on Tru64 and does not harm on Linux. I'll check it in and see what happens on the snake farm tonight. There's one more problem with Tru64: cc -o python Modules/python.o libpython2.3.a -lrt -lpthread -lm -threads ld: Unresolved: makedev It looks like Tru64 doesn't have a makedev(). You added the patch that included this a while back. Do you have any idea what we should do on Tru64? Jeremy From skip@pobox.com Thu Jul 18 23:51:09 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 18 Jul 2002 17:51:09 -0500 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects dictobject.c,2.127,2.128 floatobject.c,2.113,2.114 intobject.c,2.84,2.85 listobject.c,2.120,2.121 longobject.c,1.119,1.120 rangeobject.c,2.42,2.43 stringobject.c,2.169,2.170 tupleobject.c,2.69,2.70 typeobject.c,2.160,2.161 unicodeobject.c,2.155,2.156 xxobject.c,2.20,2.21 In-Reply-To: <3D372E1D.50009@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> Message-ID: <15671.18013.841675.41967@localhost.localdomain> >> If you think the configure test works, why do you have platform >> specific ifdefs in your header file? mal> Because it doesn't always work :-) Why not just add the necessary goo to configure so it does work for the various reported cases? Skip From mhammond@skippinet.com.au Fri Jul 19 00:03:38 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 19 Jul 2002 09:03:38 +1000 Subject: [Python-Dev] Review of build system patch requested In-Reply-To: <200207171418.g6HEIZo00747@odiug.zope.com> Message-ID: > > * Makefile.pre.in has been changed to pass "-DPy_BUILD_CORE" to > the compiler > > when building Python itself and any builtin modules. This flag is > > not passed to extension modules. > > My only concern would be that tools which parse the Makefile (I > believe distutils does this?) should not accidentally pick up the > "-DPy_BUILD_CORE" flag. > > Apart from that I trust your judgement and Neal's test drive. Thanks Guido. I mailed the distutils sig, and Andrew Kuchling replied that my change should be safe. Now I need some help checking this baby in! My change touches Makefile.pre.in and configure.in, and require that both "autoheader" and "autoconf" be run to correctly regenerate output files. How should I do this checkin? Is it necessary for me to perform any additional steps, or is there some magic that allows me to simply check these 2 files in and have everything else work? Thanks, Mark. From jeremy@alum.mit.edu Fri Jul 19 00:05:27 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Jul 2002 19:05:27 -0400 Subject: [Python-Dev] Re: staticforward In-Reply-To: <15671.18013.841675.41967@localhost.localdomain> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.18013.841675.41967@localhost.localdomain> Message-ID: <15671.18871.846980.217653@slothrop.zope.com> >>>>> "SM" == Skip Montanaro writes: SM> Why not just add the necessary goo to configure so it does work SM> for the various reported cases? Because there are not first-hand reported cases. The only case that MAL has mentioned is an unnecessary use of staticforward with an array declaration and later initialization in a third-party extension module. There's nothing in the core that needs help from configure. Jeremy From mhammond@skippinet.com.au Fri Jul 19 00:15:46 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 19 Jul 2002 09:15:46 +1000 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <034701c22e92$9473dfc0$ced241d5@hagrid> Message-ID: Fredrik: > greg wrote: > > > Someone told me that Pyrex should be generating > > __declspec(dllexport) for the module init func. > > almost; for portability, it's better to use the DL_EXPORT > provided by Python.h: > > DL_EXPORT(void) > init_module(void) > { > ... > } > > > But someone else says this is only needed if > > you're importing a dll as a library, and that > > it's not needed for Python extensions. FWIW, www.python.org/sf/566100 deprecates DL_IMPORT/DL_EXPORT as it is broken! Once this patch is checked in, the new blessed way to declare your function will be: PyMODINIT_FUNC init_module(void) { ... } This macro will do the right thing in all situations and for all platforms. It even provides the 'extern "C"' if your extension is in a C++ module. The-patch-even-updates-the-doc ly, Mark. From neal@metaslash.com Fri Jul 19 01:49:38 2002 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 18 Jul 2002 20:49:38 -0400 Subject: [Python-Dev] Re: configure problems porting to Tru64 References: <15671.4640.361811.434411@slothrop.zope.com> <15671.17150.922349.270282@slothrop.zope.com> Message-ID: <3D376222.B0ED0D63@metaslash.com> Jeremy Hylton wrote: > > There's one more problem with Tru64: > > cc -o python Modules/python.o libpython2.3.a -lrt -lpthread -lm -threads > ld: > Unresolved: > makedev > > It looks like Tru64 doesn't have a makedev(). You added the patch > that included this a while back. Do you have any idea what we should > do on Tru64? >From a distant memory, makedev is a macro (or may be depending on #define's) and needs the proper header file. I hope my memory is correct, but I don't even trust it. ...maybe I should, there is a makedev macro in sys/types.h on a Compaq Tru64 UNIX V5.1 (Rev. 732) (192.233.54.155) (compaq testdrive box). It looks like _OSF_SOURCE must be defined, possibly other macros. Neal From greg@cosc.canterbury.ac.nz Fri Jul 19 01:48:47 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 12:48:47 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: <200207190048.g6J0ml904071@oma.cosc.canterbury.ac.nz> Alex Martelli : > Me: > > Then the term "iterator" could have been reserved > > for the special case of an object that provides stream > > access to a random-access collection. > > > Nice touch, except that I keep quibbling on the "random > > access" need -- see my previous msg about sets. Well, substitute the term "non-destructively readable" or "multi-pass capable" or something like that if you prefer. > Seekable files can be multi-pass, but in the strict sense > that you can rewind them -- it's still impractical to have > them produce multiple *independent* iterators (needing > some sort of in-memory caching). Yes, that's the key idea I had in mind. So make it "independent multi-pass capable". :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Jul 19 01:52:20 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 12:52:20 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: <200207190052.g6J0qKS04080@oma.cosc.canterbury.ac.nz> Alex Martelli : > I suspect read and write would best be kept on separate > interfaces. Yes, obviously you would be allowed to have streams that implemented one or the other or both. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Jul 19 01:55:22 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 12:55:22 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207181930.g6IJUfX22643@odiug.zope.com> Message-ID: <200207190055.g6J0tLk04092@oma.cosc.canterbury.ac.nz> > C++ has tried very hard to do this with its istream, ostream and > iostream classes; I believe I heard C++ people say once that it's not > considered a success. Well, everything in C++ seems to end up being way more complicated than it ought to. The Python version would be much simpler, since you wouldn't have to formally spell out all the interface conventions. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From neal@metaslash.com Fri Jul 19 02:04:13 2002 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 18 Jul 2002 21:04:13 -0400 Subject: [Python-Dev] Review of build system patch requested References: Message-ID: <3D37658D.E41060C4@metaslash.com> Mark Hammond wrote: > > > > * Makefile.pre.in has been changed to pass "-DPy_BUILD_CORE" to > > the compiler > > > when building Python itself and any builtin modules. This flag is > > > not passed to extension modules. > > > > My only concern would be that tools which parse the Makefile (I > > believe distutils does this?) should not accidentally pick up the > > "-DPy_BUILD_CORE" flag. > > Thanks Guido. I mailed the distutils sig, and Andrew Kuchling replied that > my change should be safe. > > Now I need some help checking this baby in! My change touches > Makefile.pre.in and configure.in, and require that both "autoheader" and > "autoconf" be run to correctly regenerate output files. > > How should I do this checkin? Is it necessary for me to perform any > additional steps, or is there some magic that allows me to simply check > these 2 files in and have everything else work? I regenerated configure and Makefile.pre.in and attached it to the patch. While regenerating I got a warning: autoheader: missing template: _XOPEN_SOURCE It would be good to have someone look over/test the new configure, etc. Neal From greg@cosc.canterbury.ac.nz Fri Jul 19 02:25:53 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 13:25:53 +1200 (NZST) Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: Message-ID: <200207190125.g6J1PrG04203@oma.cosc.canterbury.ac.nz> Tim Peters : > The best thing to do for Windows is ask that Windows users supply > patches. It was using a patch supplied by a Windows user that got me into this mess. He said that the DL_EXPORT macro didn't work for him. But it sounds like using DL_EXPORT is the officially correct thing to do, so I'll do that. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Jul 19 02:40:06 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 13:40:06 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <01KK9VLD2I56A296UI@it.canterbury.ac.nz> Message-ID: <200207190140.g6J1e6U04243@oma.cosc.canterbury.ac.nz> > at least on Unix-ish systems, you > could also get the same effect with dup2 without even needing any > filename around No, you couldn't. dup() or dup2() will give you another file descriptor sharing the same file-position pointer. To get a completely independent access path I think you have to open the file again starting from the pathname. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Fri Jul 19 04:52:24 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 18 Jul 2002 23:52:24 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <200207190125.g6J1PrG04203@oma.cosc.canterbury.ac.nz> Message-ID: [Tim] > The best thing to do for Windows is ask that Windows users supply > patches. [Greg Ewing] > It was using a patch supplied by a Windows user that got > me into this mess. He said that the DL_EXPORT macro > didn't work for him. Sucker . > But it sounds like using DL_EXPORT is the officially > correct thing to do, so I'll do that. Until Mark's patch, yes (see his post in this thread). From tim.one@comcast.net Fri Jul 19 04:54:16 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 18 Jul 2002 23:54:16 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: Message-ID: [Mark Hammond] > FWIW, www.python.org/sf/566100 deprecates DL_IMPORT/DL_EXPORT as it is > broken! Once this patch is checked in, the new blessed way to > declare your function will be: > > PyMODINIT_FUNC init_module(void) > { > ... > } > > This macro will do the right thing in all situations and for all > platforms. > It even provides the 'extern "C"' if your extension is in a C++ module. > > The-patch-even-updates-the-doc ly, This patch is a Good Thing, and I demand that everyone show you more appreciation for it. for-my-next-act-i'll-command-the-tide-to-retreat-ly y'rs - tim From guido@python.org Fri Jul 19 05:24:13 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 00:24:13 -0400 Subject: [Python-Dev] Review of build system patch requested In-Reply-To: Your message of "Fri, 19 Jul 2002 09:03:38 +1000." References: Message-ID: <200207190424.g6J4ODA08239@pcp02138704pcs.reston01.va.comcast.net> > Now I need some help checking this baby in! My change touches > Makefile.pre.in and configure.in, and require that both "autoheader" and > "autoconf" be run to correctly regenerate output files. > > How should I do this checkin? Is it necessary for me to perform any > additional steps, or is there some magic that allows me to simply check > these 2 files in and have everything else work? You need to check in the files that result from running these two; I believe that's configure and pyconfig.h.in. Note that we require just about the latest and greatest autoconf. If you screw up MvL will correct you. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Fri Jul 19 05:50:03 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Jul 2002 16:50:03 +1200 (NZST) Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: Message-ID: <200207190450.g6J4o3w05817@oma.cosc.canterbury.ac.nz> > > But it sounds like using DL_EXPORT is the officially > > correct thing to do, so I'll do that. > > Until Mark's patch, yes (see his post in this thread). Yeah, but I'm not going to worry about that until it becomes part of a regular release. Thanks, Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From aleax@aleax.it Fri Jul 19 07:16:34 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 19 Jul 2002 08:16:34 +0200 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: References: Message-ID: On Friday 19 July 2002 12:26 am, Tim Peters wrote: > > What about: > > > > "...sequences. Note that the act of looking at an iterator's > > elements mutates the iterator." > > That doesn't belong in the spec either -- nothing requires an iterator to > have mutable state, let alone to mutate it when next() is called. Right, for unbounded iterators returning constant values, such as: class Ones: def __iter__(self): return self def next(self): return 1 However, such "exceptions that prove the rule" are rare enough that I wouldn't consider their existence as forbidding to say _anything_ about state mutation. I _would_ similarly say that x[y]=z normally mutates x, even though "del __setitem__(self, key): pass" is quite legal. Inserting an adverb such as "generally" or "usually" should suffice to make even the most grizzled sea lawyer happy while keeping the information in. Alex From mal@lemburg.com Fri Jul 19 09:31:50 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 10:31:50 +0200 Subject: [Python-Dev] staticforward References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> Message-ID: <3D37CE76.4020803@lemburg.com> Jeremy Hylton wrote: >>>>>>"MAL" == mal writes: >>>>> > > MAL> What are you after here ? Remove the configure.in test as well > MAL> ? > > It is already gone. And earlier in this thread, we established that > it did you no good, right? No and I think I was clear about the fact that I don't want this to be removed. > You only care about compilers that choke > on static array decls with later initialization, and the test doesn't > catch that. The test tries to catch a general problem in some compilers: that static forward declarations cause compile time errors. However, it only tests this for structs, not arrays and functions. So not all problems related to static forward declarations are catched. That's why I had to add support for this to the header file I'm using. As a result, the test should be extended to also check for the array case and the function case, so that all relevant static forward declaration bugs in the compiler trigger the #define of BAD_STATIC_FORWARD since that's what the symbol is all about. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From mal@lemburg.com Fri Jul 19 09:44:17 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 10:44:17 +0200 Subject: [Python-Dev] Incompatible changes to xmlrpclib References: <3D240FF2.3060708@lemburg.com> <3D2F3F06.1060800@lemburg.com> Message-ID: <3D37D161.5@lemburg.com> > Any news on this one ? If noone objects, I'd like to restore the old interface. >> I noticed yesterday that the xmlrcplib.py version in CVS >> is incompatible with the version in Python 2.2: all the >> .dump_XXX() interfaces changed and now include a third >> argument. >> >> Since the Marshaller can be subclassed, this breaks all >> existing application space subclasses extending or changing >> the default xmlrpclib behaviour. >> >> I'd opt for moving back to the previous style of calling the >> write method via self.write. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From martin@v.loewis.de Fri Jul 19 08:40:22 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 19 Jul 2002 09:40:22 +0200 Subject: [Python-Dev] Re: configure problems porting to Tru64 In-Reply-To: <15671.17150.922349.270282@slothrop.zope.com> References: <15671.4640.361811.434411@slothrop.zope.com> <15671.17150.922349.270282@slothrop.zope.com> Message-ID: jeremy@alum.mit.edu (Jeremy Hylton) writes: > It looks like Tru64 doesn't have a makedev(). You added the patch > that included this a while back. Do you have any idea what we should > do on Tru64? Neal says you need to define _OSF_SOURCE, but it would better if we could do without. If not, we should both define _OSF_SOURCE (perhaps only on OSF), and add an autoconf test for makedev. Regards, Martin From fredrik@pythonware.com Fri Jul 19 10:31:56 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 19 Jul 2002 11:31:56 +0200 Subject: [Python-Dev] Incompatible changes to xmlrpclib References: <3D240FF2.3060708@lemburg.com> <3D2F3F06.1060800@lemburg.com> <3D37D161.5@lemburg.com> Message-ID: <003701c22f07$21945140$0900a8c0@spiff> mal wrote: > > Any news on this one ? >=20 > If noone objects, I'd like to restore the old interface. the dump methods are an internal implementation details, and are only accessed through an internal dispatcher table. even if you override them, the marshaller won't use your new methods. so what exactly is your use case? (and whatever you did to make that use case work, how do I stop you from doing the same thing with some other internal part of the standard library? ;-) From mal@lemburg.com Fri Jul 19 10:46:18 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 11:46:18 +0200 Subject: [Python-Dev] PEP: Support for System Upgrades Message-ID: <3D37DFEA.9070506@lemburg.com> PEP: 0??? Title: Support for System Upgrades Version: $Revision: 0.0 $ Author: mal@lemburg.com (Marc-Andr? Lemburg) Status: Draft Type: Standards Track Python-Version: 2.3 Created: 19-Jul-2001 Post-History: Abstract This PEP proposes strategies to allow the Python standard library to be upgraded in parts without having to reinstall the complete distribution or having to wait for a new patch level release. Problem Python currently does not allow overriding modules or packages in the standard library per default. Even though this is possible by defining a PYTHONPATH environment variable (the paths defined in this variable are prepended to the Python standard library path), there is no standard way of achieving this without changing the configuration. Since Python's standard library is starting to host packages which are also available separately, e.g. the distutils, email and PyXML packages, which can also be installed independently of the Python distribution, it is desireable to have an option to upgrade these packages without having to wait for a new patch level release of the Python interpreter to bring along the changes. Proposed Solutions This PEP proposes two different but not necessarily conflicting solutions: 1. Adding a new standard search path to sys.path: $stdlibpath/system-packages just before the $stdlibpath entry. This complements the already existing entry for site add-ons $stdlibpath/site-packages which is appended to the sys.path at interpreter startup time. To make use of this new standard location, distutils will need to grow support for installing certain packages in $stdlibpath/system-packages rather than the standard location for third-party packages $stdlibpath/site-packages. 2. Tweaking distutils to install directly into $stdlibpath for the system upgrades rather than into $stdlibpath/site-packages. The first solution has a few advantages over the second: * upgrades can be easily identified (just look in $stdlibpath/system-packages) * upgrades can be deinstalled without affecting the rest of the interpreter installation * modules can be virtually removed from packages; this is due to the way Python imports packages: once it finds the top-level package directory it stay in this directory for all subsequent package submodule imports * the approach has an overall much cleaner design than the hackish install on top of an existing installation approach The only advantages of the second approach are that the Python interpreter does not have to changed and that it works with older Python versions. Both solutions require changes to distutils. These changes can also be implemented by package authors, but it would be better to define a standard way of switching on the proposed behaviour. Scope Solution 1: Python 2.3 and up Solution 2: all Python versions supported by distutils Credits None References None Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End: -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From mal@lemburg.com Fri Jul 19 11:00:42 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 12:00:42 +0200 Subject: [Python-Dev] Incompatible changes to xmlrpclib References: <3D240FF2.3060708@lemburg.com> <3D2F3F06.1060800@lemburg.com> <3D37D161.5@lemburg.com> <003701c22f07$21945140$0900a8c0@spiff> Message-ID: <3D37E34A.9050207@lemburg.com> Fredrik Lundh wrote: > mal wrote: > > >>>Any news on this one ? >> >>If noone objects, I'd like to restore the old interface. > > the dump methods are an internal implementation details, and are > only accessed through an internal dispatcher table. even if you > override them, the marshaller won't use your new methods. If I subclass the Marshaller and Unmarshaller and then use the subclasses, it would :-) > so what exactly is your use case? I needed to adapt the type mapping in xmlrpclib a bit to better fit our needs. This is done by adding a few more methods to the Marshaller and Unmarshaller (it's a hack, but the module doesn't allow any other method, AFAIK): def install_xmlrpclib_addons(xmlrpclib): m = xmlrpclib.Marshaller m.dump_datetime = _dump_datetime m.dispatch[DateTime.DateTimeType] = m.dump_datetime m.dump_buffer = _dump_buffer m.dispatch[types.BufferType] = m.dump_buffer m.dump_int = _dump_int m.dispatch[types.IntType] = m.dump_int u = xmlrpclib.Unmarshaller u.end_dateTime = _load_datetime u.dispatch['dateTime.iso8601'] = u.end_dateTime u.end_base64 = _load_buffer u.dispatch['base64'] = u.end_base64 u.end_boolean = _load_boolean u.dispatch['boolean'] = u.end_boolean > (and whatever you did to make that use case work, how do I stop > you from doing the same thing with some other internal part of the > standard library? ;-) It would be nice to open up the module a little more so that hacks like the one above are not necessary, e.g. by making the used classes parameters to the loads/dumps functions. Then you'd run into the same problem, though, since now subclasses would need to access the dump/load methods. PS: Standard support for None would be nice to have in xmlrpclib... at least for the Marshalling side, since this is a very common problem with xmlrpc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From jmiller@stsci.edu Fri Jul 19 12:29:37 2002 From: jmiller@stsci.edu (Todd Miller) Date: Fri, 19 Jul 2002 07:29:37 -0400 Subject: [Python-Dev] Fw: Behavior of buffer() Message-ID: <3D37F821.8010908@stsci.edu> This is a re-post in plain text of a message I sent yesterday in HTML. Anyone not "consumed with interest" in the buffer object should probably skip it. Scott Gilbert wrote: >--- Todd Miller wrote: > >>>I don't understand what you say, but I believe you. >>> >>I meant we call PyBuffer_FromReadWriteObject and the resulting buffer >>lives longer than the extension function call that created it. I have >>heard that it is possible for the original object to "move" leaving the >>buffer object pointer to it dangling. >> > >Yes. The PyBufferObject grabs the pointer from the PyBufferProcs >supporting object when the PyBufferObject is created. If the PyBufferProcs >supporting object reallocates the memory (possibly from a resize) the > Thanks for the example. > >PyBufferObject can be left with a bad pointer. This is easily possible if >you try to use the array module arrays as a buffer. > This is good to know. > > >I've submitted a patch to fix this particular problem (among others), but >there are still enough things that the buffer object can't do that >something new is needed. > I understand. I saw your patches and they sounded good to me. > >>> >>>>>Maybe instead of the buffer() function/type, there should be a way to >>>>>allocate raw memory? >>>>> >>>>Yes. It would also be nice to be able to: >>>> >>>>1. Know (at the python level) that a type supports the buffer C-API. >>>> >>>Good idea. (I guess right now you can see if calling buffer() with an >>>instance as argument works. :-) >>> >>>>2. Copy bytes from one buffer to another (writeable buffer). >>>> > >And the copy operations shouldn't create any large temporaries: > I agree with this completely. I could summarize my opinion by saying that while I regard the current buffering system as pretty complete, the buffer object places emphasis on the wrong behavior. In terms of modelling memory regions, strings are the wrong way to go. > > > buf1 = memory(50000) > buf2 = memory(50000) > # no 10K temporary should be created in the next line > buf1[10000:20000] = buf2[30000:40000] > >The current buffer object could be used like this, but it would create a >temporary string. > Looking at buffering most of this week, the fact that mmap slicing also returns strings is one justification I've found for having a buffer object, i.e., mmap slicing is not a substitute for the buffer object. The buffer object makes it possible to partition a mmap or any bufferable object into pseudo-independent, possibly writable, pieces. One justification to have a new buffer object is pickling (one of Scott's posts alerted me to this). I think the behavior we want for numarray is to be able to pickle a view of a bufferable object more or less like a string containing the buffer image, and to unpickle it as a memory object. The prospect of adding pickling support makes me wonder if seperating the allocator and view aspects of the buffer object is a good idea; I thought it was, but now I wonder. > >So getting an efficient copy operation seems to require that slices just >create new "views" to the same memory. > Other justifications for a new buffer object might be: 1. The ability to partition any bufferable object into regions which can be passed around. These regions would themselves be buffers. 2. The ability to efficiently pickle a view of any bufferable object. > >>>Maybe you would like to work on a requirements gathering for a memory >>>object >>> >>Sure. I'd be willing to poll comp.lang.python (python-list?) and >>collate the results of any discussion that ensues. Is that what you had >>in mind? >> > > >In the PEP that I'm drafting, I've been calling the new object "bytes" >(since it is just a simple array of bytes). Now that you guys are >referring to it as the "memory object", should I change the name? Doesn't >really matter, but it might avoid confusion to know we're all talking about >the same thing. > Calling this a memory type sounds the best to me. The question I have not resolved for myself is whether there should be one type which "does it all" or two types, a memory allocator and a bufferable object manipulator. > > > >__________________________________________________ >Do You Yahoo!? >Yahoo! Autos - Get free new car price quotes >http://autos.yahoo.com > From ping@zesty.ca Fri Jul 19 12:44:09 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Fri, 19 Jul 2002 04:44:09 -0700 (PDT) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207181422.g6IEMBr14526@odiug.zope.com> Message-ID: On Thu, 18 Jul 2002, Guido van Rossum wrote: > First of all, I'm not sure what exactly the issue is with destructive > for-loops. It's just not the way i expect for-loops to work. Perhaps we would need to survey people for objective data, but i feel that most people would be surprised if for x in y: print x for x in y: print x did not print the same thing twice, or if if x in y: print 'got it' if x in y: print 'got it' did not do the same thing twice. I realize this is my own opinion, but it's a fairly strong impression i have. Even if it's okay for for-loops to destroy their arguments, i still think it sets up a bad situation: we may end up with functions manipulating sequence-like things all over, but it becomes unclear whether they destroy their arguments or not. It becomes possible to write a function which sometimes destroys its argument and sometimes doesn't. Bugs get deeper and harder to find. I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive. > Maybe the for-loop is a red herring? Calling next() on an > iterator may or may not be destructive on the underlying "sequence" -- > if it is a generator, for example, I would call it destructive. Well, for a generator, there is no underlying sequence. while 1: print next(gen) makes it clear that there is no sequence, but for x in gen: print x seems to give me the impression that there is. > Perhaps you're trying to assign properties to the iterator abstraction > that aren't really there? I'm assigning properties to "for" that you aren't. I think they are useful properties, though, and worth considering. I don't think i'm assigning properties to the iterator abstraction; i expect iterators to destroy themselves. But the introduction of iterators, in the way they are now, breaks this property of "for" loops that i think used to hold almost all the time in Python, and that i think holds all the time in almost all other languages. > Next, I'm not sure how renaming next() to __next__() would affect the > situation w.r.t. the destructivity of for-loops. Or were you talking > about some other migration? The connection is indirect. The renaming is related to: (a) making __next__() a real, honest-to-goodness protocol independent of __iter__; and (b) getting rid of __iter__ on iterators. It's the presence of __iter__ on iterators that breaks the non-destructive-for property. I think the renaming of next() to __next__() is a good idea in any case. It is distant enough from the other issues that it can be done independently of any decisions about __iter__. -- ?!ng From ping@zesty.ca Fri Jul 19 12:28:32 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Fri, 19 Jul 2002 04:28:32 -0700 (PDT) Subject: [Python-Dev] The iterator story Message-ID: Here is a summary of the whole iterator picture as i currently see it. This is necessarily subjective, but i will try to be precise so that it's clear where i'm making a value judgement and where i'm trying to state fact, and so we can pinpoint areas where we agree and disagree. In the subjective sections, i have marked with [@] the places where i solicit agreement or disagreement. I would like to know your opinions on the issues listed below, and on the places marked [@]. Definitions (objective) ----------------------- Container: a thing that provides non-destructive access to a varying number of other things. Why "non-destructive"? Because i don't expect that merely looking at the contents will cause a container to be altered. For example, i expect to be able to look inside a container, see that there are five elements; leave it alone for a while, come back to it later and observe once again that there are five elements. Consequently, a file object is not a container in general. Given a file object, you cannot look at it to see if it contains an "A", and then later look at it once again to see if it contains an "A" and get the same result. If you could seek, then you could do this, but not all files support seeking. Even if you could seek, the act of reading the file would still alter the file object. The file object provides no way of getting at the contents without mutating itself. According to my definition, it's fine for a container to have ways of mutating itself; but there has to be *some* way of getting the contents without mutating the container, or it just ain't a container to me. A file object is better described as a stream. Hypothetically one could create an interface to seekable files that offered some non-mutating read operations; this would cause the file to look more like an array of bytes, and i would find it appropriate to call that interface a container. Iterator: a thing that you can poke (i.e. send a no-argument message), where each time you poke it, it either yields something or announces that it is exhausted. For an iterator to mutate itself every time you poke it is not part of my definition. But the only non-mutating iterator would be an iterator that returns the same thing forever, or an iterator that is always exhausted. So most iterators usually mutate. Some iterators are associated with a container, but not all. There can be many kinds of iterators associated with a container. The most natural kind is one that yields the elements of the container, one by one, mutating itself each time it is poked, until it has yielded all of the elements of the container and announces exhaustion. A Container's Natural Iterator: an iterator that yields the elements of the container, one by one, in the order that makes the most sense for the container. If the container has a finite size n, then the iterator can be poked exactly n times, and thereafter it is exhausted. Issues (objective) ------------------ I alluded to a set of issues in an earlier message, and i'll begin there, by defining what i meant more precisely. The Destructive-For Issue: In most languages i can think of, and in Python for the most part, a statement such as "for x in y: print x" is a non-destructive operation on y. Repeating "for x in y: print x" will produce exactly the same results once more. For pre-iterator versions of Python, this fails to be true only if y's __getitem__ method mutates y. The introduction of iterators has caused this to now be untrue when y is any iterator. The issue is, should "for" be non-destructive? The Destructive-In Issue: Notice that the iteration that takes place for the "in" operator is implemented in the same way as "for". So if "for" destroys its second operand, so will "in". The issue is, should "in" be non-destructive? (Similar issues exist for built-ins that iterate, like list().) The __iter__-On-Iterators Issue: Some people have mentioned that the presence of an __iter__() method is a way of signifying that an object supports the iterator protocol. It has been said that this is necessary because the presence of a "next()" method is not sufficiently distinguishing. Some have said that __iter__() is a completely distinct protocol from the iterator protocol. The issue is, what is __iter__() really for? And secondarily, if it is not part of the iterator protocol, then should we require __iter__() on iterators, and why? The __next__-Naming Issue: The iteration method is currently called "next()". Previous candidates for the name of this method were "next", "__next__", and "__call__". After some previous debate, it was pronounced to be "next()". There are concerns that "next()" might collide with existing methods named "next()". There is also a concern that "next()" is inconsistent because it is the only type-slot-method that does not have a __special__ name. The issue is, should it be called "next" or "__next__"? My Positions (subjective) ------------------------- I believe that "for" and "in" and list() should be non-destructive. I believe that __iter__() should not be required on iterators. I believe that __next__() is a better name than next(). Destructive-For, Destructive-In: I think "for" should be non-destructive because that's the way it has almost always behaved, and that's the way it behaves in any other language [@] i can think of. For a container's __getitem__ method to mutate the container is, in my opinion, bad behaviour. In pre-iterator Python, we needed some way to allow the convenience of "for" on user-implemented containers. So "for" supported a special protocol where it would call __getitem__ with increasing integers starting from 0 until it hit an IndexError. This protocol works great for sequence-like containers that were indexable by integers. But other containers had to be hacked somewhat to make them fit. For example, there was no good way to do "for" over a dictionary-like container. If you attempted "for" over a user-implemented dictionary, you got a really weird "KeyError: 0", which only made sense if you understood that the "for" loop was attempting __getitem__(0). (Hey! I just noticed that from UserDict import UserDict for k in UserDict(): print k still produces "KeyError: 0"! This oughta be fixed...) If you wanted to support "for" on something else, sometimes you would have to make __getitem__ mutate the object, like it does in the fileinput module. But then the user has to know that this object is a special case: "for" only works the first time. When iterators were introduced, i believed they were supposed to solve this problem. Currently, they don't. Currently, "in" can even be destructive. This is more serious. While one could argue that it's not so strange for for x in y: ... to alter y (even though i do think it is strange), i believe just about anyone would find it very counterintuitive for if x in y: to alter y. [@] __iter__-On-Iterators: I believe __iter__ is not a type flag. As i argued previously, i think that looking for the presence of methods that don't actually implement a protocol is a poor way to check for protocol support. And as things stand, the presence of __iter__ doesn't even work [@] as a type flag. There are objects with __iter__ that are not iterators (like most containers). And there are objects without __iter__ that work as iterators. I know you can legislate the latter away, but i think such legislation would amount to fighting the programmers -- and it is infeasible [@] to enforce the presence of __iter__ in practice. Based on Guido's positive response, in which he asked me to make an addition to the PEP, i believe Guido agrees with me that __iter__ is distinct from the protocol of an iterator. This surprised me because it runs counter to the philosophy previously expressed in the PEP. Now suppose we agree that __iter__ and next are distinct protocols. Then why require iterators to support both? The only reason we would want __iter__ on iterators is so that we can use "for" [@] with an iterator as the second operand. I have just argued, above, that it's *not* a good idea for "for" and "in" to be destructive. Since most iterators self-mutate, it follows that it's not advisable to use an iterator directly as the second operand of a "for" or "in". I realize this seems radical! This may be the most controversial point i have made. But if you accept that "in" should not destroy its second argument, the conclusion is unavoidable. __next__-Naming: I think the potential for collision, though small, is significant, and this makes "__next__" a better choice than "next". A built-in function next() should be introduced; this function would call the tp_iternext slot, and for instance objects tp_iternext would call the __next__ method implemented in Python. The connection between this issue and the __iter__ issue is that, if next() were renamed to __next__(), the argument that __iter__ is needed as a flag would also go away. The Current PEP (objective) --------------------------- The current PEP takes the position that "for" and "in" can be destructive; that __iter__() and next() represent two distinct protocols, yet iterators are required to support both; and that the name of the method on iterators is called "next()". My Ideal Protocol (subjective) ------------------------------ So by now the biggest question/objection you probably have is "if i can't use an iterator with 'for', then how can i use it?" The answer is that "for" is a great way to iterate over things; it's just that it iterates over containers and i want to preserve that. We need a different way to iterate over iterators. In my ideal world, we would allow a new form of "for", such as for line from file: print line The use if "from" instead of "in" would imply that we were (destructively) pulling things out of the iterator, and would remove any possible parallel to the test "x in y", which should rightly remain non-destructive. Here's the whole deal: - Iterators provide just one method, __next__(). - The built-in next() calls tp_iternext. For instances, tp_iternext calls __next__. - Objects wanting to be iterated over provide just one method, __iter__(). Some of these are containers, but not all. - The built-in iter(foo) calls tp_iter. For instances, tp_iter calls __iter__. - "for x in y" gets iter(y) and uses it as an iterator. - "for x from y" just uses y as the iterator. That's it. Benefits: - We have a nice clean division between containers and iterators. - When you see "for x in y" you know that y is a container. - When you see "for x from y" you know that y is an iterator. - "for x in y" never destroys y. - "if x in y" never destroys y. - If you have an object that is container-like, you can add an __iter__ method that gives its natural iterator. If you want, you can supply more iterators that do different things; no problem. No one using your object is confused about whether it mutates. - If you have an object that is cursor-like or stream-like, you can safely make it into an iterator by adding __next__. No one using your object is confused about whether it mutates. Other notes: - Iterator algebra still works fine, and is still easy to write: def alternate(it): while 1: yield next(it) next(it) - The file problem has a consistent solution. Instead of writing "for line in file" you write for line from file: print line Being forced to write "from" signals to you that the file is eaten up. There is no expectation that "for line from file" will work again. The best would be a convenience function "readlines", to make this even clearer: for line in readlines("foo.txt"): print line Now you can do this as many times as you want, and there is no possibility of confusion; there is no file object on which to call methods that might mess up the reading of lines. My Not-So-Ideal Protocol ------------------------ All right. So new syntax may be hard to swallow. An alternative is to introduce an adapter that turns an iterator into something that "for" will accept -- that is, the opposite of iter(). - The built-in seq(it) returns x such that iter(x) yields it. Then instead of writing for x from it: you would write for x in seq(it): and the rest would be the same. The use of "seq" here is what would flag the fact that "it" will be destroyed. -- ?!ng From jeremy@alum.mit.edu Fri Jul 19 13:20:20 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 19 Jul 2002 08:20:20 -0400 Subject: [Python-Dev] staticforward In-Reply-To: <3D37CE76.4020803@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> <3D37CE76.4020803@lemburg.com> Message-ID: <15672.1028.161004.894848@slothrop.zope.com> >>>>> "MAL" == mal writes: MAL> What are you after here ? Remove the configure.in test as well MAL> ? >> >> It is already gone. And earlier in this thread, we established >> that it did you no good, right? MAL> No and I think I was clear about the fact that I don't want MAL> this to be removed. It's clear you don't want it to be removed, but not entirely clear why. We've got a whole alpha and beta cycle to see if anyone finds an actual compiler problem with the Python core. During that time, you can see if the problem occurs for the header file you mentioned. (The one where you use it for an array even though you could rearrange the code to eliminate it.) >> You only care about compilers that choke on static array decls >> with later initialization, and the test doesn't catch that. MAL> The test tries to catch a general problem in some compilers: No one has produced any evidence that there are still compilers that have this problem. MAL> that static forward declarations cause compile time MAL> errors. However, it only tests this for structs, not arrays and MAL> functions. So not all problems related to static forward MAL> declarations are catched. That's why I had to add support for MAL> this to the header file I'm using. The Python core has no need for tests on arrays or functions. (Indeed, staticforward was not intended for function prototypes.) Jeremy From neal@metaslash.com Fri Jul 19 13:42:58 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Jul 2002 08:42:58 -0400 Subject: [Python-Dev] Re: configure problems porting to Tru64 References: <15671.4640.361811.434411@slothrop.zope.com> <15671.17150.922349.270282@slothrop.zope.com> Message-ID: <3D380952.CF927B10@metaslash.com> "Martin v. Loewis" wrote: > > jeremy@alum.mit.edu (Jeremy Hylton) writes: > > > It looks like Tru64 doesn't have a makedev(). You added the patch > > that included this a while back. Do you have any idea what we should > > do on Tru64? > > Neal says you need to define _OSF_SOURCE, but it would better if we > could do without. If not, we should both define _OSF_SOURCE (perhaps > only on OSF), and add an autoconf test for makedev. I agree with Martin. It would be best to only define _OSF_SOURCE if absolutely necessary and use autoconf. Neal From guido@python.org Fri Jul 19 13:59:15 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 08:59:15 -0400 Subject: [Python-Dev] staticforward In-Reply-To: Your message of "Fri, 19 Jul 2002 10:31:50 +0200." <3D37CE76.4020803@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> <3D37CE76.4020803@lemburg.com> Message-ID: <200207191259.g6JCxGp24808@pcp02138704pcs.reston01.va.comcast.net> > The test tries to catch a general problem in some compilers: that > static forward declarations cause compile time errors. However, > it only tests this for structs, not arrays and functions. > So not all problems related to static forward declarations are > catched. That's why I had to add support for this to the > header file I'm using. > > As a result, the test should be extended to also check for the > array case and the function case, so that all relevant static > forward declaration bugs in the compiler trigger the > #define of BAD_STATIC_FORWARD since that's what the symbol > is all about. Sorry, Marc-Andre, this has lasted long enough. Compilers that don't support this are clearly broken according to the ANSI C std. When Python was first released, such broken compilers perhaps had the excuse that it was a tricky issue in the std and that K&R didn't do it that way. That was many years ago. Platforms whose compiler is still broken in this way ought to be extinct, and I have every reason to believe that they are. It's just not worth our while to try to cater for every possible way that compilers used to be broken in the distant past. When we spot a real live broken compiler, and there's no better work-around (like rewriting the code), and we care about that platform, and there's no alternative compiler available, we may add some cruft to the code. But there's no point in gathering cruft forever without every once in a while cleaning some things up. I'll gladly put this back in as soon as you have a paying customer who wants to run Python 2.3 on a platform where the compiler is still broken in this way. Until then, it's a non-issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 13:59:37 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 08:59:37 -0400 Subject: [Python-Dev] Incompatible changes to xmlrpclib In-Reply-To: Your message of "Fri, 19 Jul 2002 10:44:17 +0200." <3D37D161.5@lemburg.com> References: <3D240FF2.3060708@lemburg.com> <3D2F3F06.1060800@lemburg.com> <3D37D161.5@lemburg.com> Message-ID: <200207191259.g6JCxbW24819@pcp02138704pcs.reston01.va.comcast.net> > If noone objects, I'd like to restore the old interface. That's between you & Fredrik Lundh. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Fri Jul 19 14:23:51 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 19 Jul 2002 09:23:51 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: References: Message-ID: <20020719132351.GA40829@hishome.net> > The Destructive-For Issue: > > In most languages i can think of, and in Python for the most > part, a statement such as "for x in y: print x" is a > non-destructive operation on y. Repeating "for x in y: print x" > will produce exactly the same results once more. > > For pre-iterator versions of Python, this fails to be true only > if y's __getitem__ method mutates y. The introduction of > iterators has caused this to now be untrue when y is any iterator. The most significant example of an object that mutates on __getitem__ in pre-iterator Python is the xreadlines object. Its __getitem__ method increments an internal counter and raises an exception if accessed out of order. This hack may be the 'original sin' - the first widely used destructive for. I just wish the time machine could have picked up your posting when the iteration protcols were designed. Good work. Your questions will require some serious meditation on the relative importance of semantic purity and backward compatibility. Oren From mal@lemburg.com Fri Jul 19 14:41:40 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 19 Jul 2002 15:41:40 +0200 Subject: [Python-Dev] staticforward References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> <3D37CE76.4020803@lemburg.com> <200207191259.g6JCxGp24808@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D381714.7040606@lemburg.com> Guido van Rossum wrote: >>The test tries to catch a general problem in some compilers: that >>static forward declarations cause compile time errors. However, >>it only tests this for structs, not arrays and functions. >>So not all problems related to static forward declarations are >>catched. That's why I had to add support for this to the >>header file I'm using. >> >>As a result, the test should be extended to also check for the >>array case and the function case, so that all relevant static >>forward declaration bugs in the compiler trigger the >>#define of BAD_STATIC_FORWARD since that's what the symbol >>is all about. > > > Sorry, Marc-Andre, this has lasted long enough. > > Compilers that don't support this are clearly broken according to the > ANSI C std. When Python was first released, such broken compilers > perhaps had the excuse that it was a tricky issue in the std and that > K&R didn't do it that way. That was many years ago. Platforms whose > compiler is still broken in this way ought to be extinct, and I have > every reason to believe that they are. """ Albert Chin-A-Young wrote on 2002-05-04: > > > > The AIX xlc ANSI compiler does not allow forward declaration of > > variables. This leads to a lot of problems with .c files that use > > staticforward (e.g. mxDateTime.c, mxProxy.c, etc.). Any chance of > > fixing these? """ I'm not making this up. > It's just not worth our while to try to cater for every possible way > that compilers used to be broken in the distant past. When we spot a > real live broken compiler, and there's no better work-around (like > rewriting the code), and we care about that platform, and there's no > alternative compiler available, we may add some cruft to the code. This sounds too much like "we == PythonLabs". Is that intended ? > But there's no point in gathering cruft forever without every once in > a while cleaning some things up. > > I'll gladly put this back in as soon as you have a paying customer who > wants to run Python 2.3 on a platform where the compiler is still > broken in this way. Until then, it's a non-issue. Hmm, a few messages ago you confirmed that my usage of staticforward and statichere was corrrect, later on, you say that it's not necessary anymore in the core so it's OK to rip it out. I am telling you that there are compilers around which don't get it right for arrays and propose to add a check for those as well -- if only to help extenions writers like myself. Nevermind, I'll add code to my stuff to emulate the configure.in check using distutils. Still, I find it frustrating that PythonLabs is giving me such a hard time because of 15 lines of code in configure.in. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Fri Jul 19 15:10:19 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 10:10:19 -0400 Subject: [Python-Dev] staticforward In-Reply-To: Your message of "Fri, 19 Jul 2002 15:41:40 +0200." <3D381714.7040606@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> <3D37CE76.4020803@lemburg.com> <200207191259.g6JCxGp24808@pcp02138704pcs.reston01.va.comcast.net> <3D381714.7040606@lemburg.com> Message-ID: <200207191410.g6JEAKf25935@pcp02138704pcs.reston01.va.comcast.net> > """ > Albert Chin-A-Young wrote on 2002-05-04: > > > > > > The AIX xlc ANSI compiler does not allow forward declaration of > > > variables. This leads to a lot of problems with .c files that use > > > staticforward (e.g. mxDateTime.c, mxProxy.c, etc.). Any chance of > > > fixing these? > """ > > I'm not making this up. He doesn't complain about the core. > > It's just not worth our while to try to cater for every possible way > > that compilers used to be broken in the distant past. When we spot a > > real live broken compiler, and there's no better work-around (like > > rewriting the code), and we care about that platform, and there's no > > alternative compiler available, we may add some cruft to the code. > > This sounds too much like "we == PythonLabs". Is that > intended ? I hope this is in general the attitude of most core Python developers. Adding cruft should be frowned upon! Else the code will become unmaintainable over time, and everybody loses. > Hmm, a few messages ago you confirmed that my usage of > staticforward and statichere was corrrect, later on, you say > that it's not necessary anymore in the core so it's OK > to rip it out. I am telling you that there are compilers > around which don't get it right for arrays and propose > to add a check for those as well -- if only to help extenions > writers like myself. You're the only person who seems to be suffering from this. > Nevermind, I'll add code to my stuff to emulate the > configure.in check using distutils. Still, I find > it frustrating that PythonLabs is giving me such a > hard time because of 15 lines of code in configure.in. I find it frustrating that you're not seeing our side. --Guido van Rossum (home page: http://www.python.org/~guido/) From David Abrahams" <20020719132351.GA40829@hishome.net> Message-ID: <0d3001c22f2f$5e2d2320$6501a8c0@boostconsulting.com> From: "Oren Tirosh" > > The Destructive-For Issue: > > > > In most languages i can think of, and in Python for the most > > part, a statement such as "for x in y: print x" is a > > non-destructive operation on y. Repeating "for x in y: print x" > > will produce exactly the same results once more. > > > > For pre-iterator versions of Python, this fails to be true only > > if y's __getitem__ method mutates y. The introduction of > > iterators has caused this to now be untrue when y is any iterator. > > The most significant example of an object that mutates on __getitem__ in > pre-iterator Python is the xreadlines object. Its __getitem__ method > increments an internal counter and raises an exception if accessed out of > order. This hack may be the 'original sin' - the first widely used > destructive for. > > I just wish the time machine could have picked up your posting when the > iteration protcols were designed. Good work. Yeah, Ping's article sure went "thunk" when I read it. At the risk of boring everyone, I think I should explain why I started the multipass iterator thread. One of the most important jobs of Boost.Python is the conversion between C++ and Python types (and if you don't give a fig for C++, hang on, because I hope this will be relevant to pure Python also). In order to support wrapping of overloaded C++ functions and member functions, it's important to be able to be able to do this in two steps: 1. Discover whether a Python object is convertible to a given C++ type 2. Perform the conversion The overload resolution mechanism is currently pretty simple-minded: it looks through the overloaded function objects until it can find one for which all the arguments are convertible to the corresponding C++ type, then it converts them and calls the wrapped C++ function. My users really want to be able to define converters which, given any Python iterable/sequence type, can extract a particular C++ container type. In order to do that, we might commonly need to inspect each element of the source object to see that it's convertible to the C++ container's value type. It's pretty easy to see that if step 1 destroys the state of an argument, it can foul the whole scheme: even if we store the result somewhere so that step 2 can re-use it, overload resolution might fail for arguments later in the function signature. Then the other overloads will be looking at a different argument object. What we were looking for was a way to quickly reject an overload if the source object was not re-iterable, without modifying it. It sure seems to me that we'd benefit from being able to do the same sort of thing in Pure Python. It's not clear to me that anyone else cares about this, but I hope one day we'll get built-in overloading or multimethod dispatch in Python beyond what's currently offered by the numeric operators. Incidentally, I'm not sure whether PEP 246 provides much help here. If the adaptation protocol only gives us a way to say "is this, or can this be adapted to be a re-iterable sequence", something could easily answer: [ x for x in y ] Which would produce a re-iterable sequence, but might also destroy the source. Of course, I'll say up front I've only skimmed the PEP and might've missed something crucial. -Dave From aahz@pythoncraft.com Fri Jul 19 15:16:58 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 10:16:58 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: References: Message-ID: <20020719141658.GA7919@panix.com> [Mark Hammond's patch -- with docs!] On Thu, Jul 18, 2002, Tim Peters wrote: > > This patch is a Good Thing, and I demand that everyone show you more > appreciation for it. If I still used Windoze for anything, I would. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From aleax@aleax.it Fri Jul 19 15:30:41 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 19 Jul 2002 16:30:41 +0200 Subject: [Python-Dev] The iterator story In-Reply-To: <0d3001c22f2f$5e2d2320$6501a8c0@boostconsulting.com> References: <20020719132351.GA40829@hishome.net> <0d3001c22f2f$5e2d2320$6501a8c0@boostconsulting.com> Message-ID: On Friday 19 July 2002 04:15 pm, David Abrahams wrote: ... > Incidentally, I'm not sure whether PEP 246 provides much help here. If the > adaptation protocol only gives us a way to say "is this, or can this be > adapted to be a re-iterable sequence", something could easily answer: Yes: that's all PEP 246 provides -- a unified way to express a request for adaptation of an object to a protocol, with the ability for the object's type, the protocol, AND a registry of installable adapters, to have a say about it (the registry is not well explained in the PEP as it stands, it's part of what I have to clarify when I rewrite it -- but my rewrite won't change what's being discussed in your quoted paragraph and the start of this one). > [ x for x in y ] or more concisely and speedily list(y). > Which would produce a re-iterable sequence, but might also destroy the > source. Of course, I'll say up front I've only skimmed the PEP and might've > missed something crucial. PEP 246 cannot in any way impede "something" (or more likely "somebody") from writing inappropriate or totally incorrect code, nor will it even try. Maybe I'm missing your point...? Alex From aahz@pythoncraft.com Fri Jul 19 15:23:49 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 10:23:49 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: References: <200207181422.g6IEMBr14526@odiug.zope.com> Message-ID: <20020719142349.GA9051@panix.com> On Fri, Jul 19, 2002, Ka-Ping Yee wrote: > > I believe this is where the biggest debate lies: whether "for" should be > non-destructive. I realize we are currently on the other side of the > fence, but i foresee enough potential pain that i would like you to > consider the value of keeping "for" loops non-destructive. Consider for line in f.readlines(): in any version of Python. Adding iterators made this more convenient and efficient, but I just can't see your POV in the general case. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From aleax@aleax.it Fri Jul 19 15:39:11 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 19 Jul 2002 16:39:11 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <20020719142349.GA9051@panix.com> References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> Message-ID: On Friday 19 July 2002 04:23 pm, Aahz wrote: > On Fri, Jul 19, 2002, Ka-Ping Yee wrote: > > I believe this is where the biggest debate lies: whether "for" should be > > non-destructive. I realize we are currently on the other side of the > > fence, but i foresee enough potential pain that i would like you to > > consider the value of keeping "for" loops non-destructive. > > Consider > > for line in f.readlines(): > > in any version of Python. Adding iterators made this more convenient > and efficient, but I just can't see your POV in the general case. The 'for', per se, is destroying nothing here -- the object returned by f.readlines() is destroyed by its reference count falling to 0 after the for, just as, say: for c in raw_input(): or x = raw_input()+raw_input() and so forth. I.e., any object gets destroyed if there are no more references to it -- that's a completely different issue. In all of these cases, you can, if you want, just bind a name to the object as you call the function, then use that object over and over again at will. _Method calls_ mutating the object on which they're called is indeed quite common, of course. f.readlines() does mutate f's state. But the object it returns, as long as there are references to it, remains. Alex From fredrik@pythonware.com Fri Jul 19 15:42:34 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 19 Jul 2002 16:42:34 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> Message-ID: <017a01c22f32$865123d0$0900a8c0@spiff> aahz wrote: > > I believe this is where the biggest debate lies: whether "for" = should be > > non-destructive. I realize we are currently on the other side of = the > > fence, but i foresee enough potential pain that i would like you to > > consider the value of keeping "for" loops non-destructive. > > Consider >=20 > for line in f.readlines(): >=20 > in any version of Python. and? for-in doesn't modify the object returned by f.readlines(), and never has. From David Abrahams" <20020719132351.GA40829@hishome.net> <0d3001c22f2f$5e2d2320$6501a8c0@boostconsulting.com> Message-ID: <0d6701c22f32$a135c0c0$6501a8c0@boostconsulting.com> From: "Alex Martelli" > PEP 246 cannot in any way impede "something" (or more likely "somebody") from > writing inappropriate or totally incorrect code, nor will it even try. Maybe > I'm missing your point...? Maybe, or maybe not. I guess if the reiterable sequence adapter says "list(x)", nobody should be using it to find out whether a thing is reiterable. Or maybe the reiterable sequence adapter shouldn't say "list(x)" because that's destructive -- though that begs the question of finding out whether x is reiterable. Maybe the PEP is just a red herring as far as the iterator problem is concerned. As long as the language has built-in facilities like 'for' and 'in' which use iteration protocols at the core of the language, re-iterability ought to be expressible likewise, in core language terms, regardless of the more-extensible mechanisms of PEP 246. whole-pile-of-maybes-ly y'rs, dave From barry@zope.com Fri Jul 19 15:59:33 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 10:59:33 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability References: <200207181422.g6IEMBr14526@odiug.zope.com> Message-ID: <15672.10581.693016.553036@anthem.wooz.org> >>>>> "KY" == Ka-Ping Yee writes: KY> It's just not the way i expect for-loops to work. Perhaps we KY> would need to survey people for objective data, but i feel KY> that most people would be surprised if | for x in y: print x | for x in y: print x KY> did not print the same thing twice, or if As with many things Pythonic, it all depends. Specifically, I think it depends on the type of y. Certainly in a pre-iterator world there was little preventing (or encouraging?) you to write y's __getitem__() non-destructively, so I don't see much difference if y is an iterator. KY> Even if it's okay for for-loops to destroy their arguments, i KY> still think it sets up a bad situation: we may end up with KY> functions manipulating sequence-like things all over, but it KY> becomes unclear whether they destroy their arguments or not. KY> It becomes possible to write a function which sometimes KY> destroys its argument and sometimes doesn't. Bugs get deeper KY> and harder to find. How is that different than pre-iterators with __getitem__()? KY> I'm assigning properties to "for" that you aren't. I think KY> they are useful properties, though, and worth considering. These aren't properties of for-loops, they are properties of the things you're iterating (little-i) over. -Barry From aahz@pythoncraft.com Fri Jul 19 16:20:29 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 11:20:29 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <017a01c22f32$865123d0$0900a8c0@spiff> References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> <017a01c22f32$865123d0$0900a8c0@spiff> Message-ID: <20020719152029.GA18810@panix.com> On Fri, Jul 19, 2002, Fredrik Lundh wrote: > aahz wrote: >>Ping: >>> >>> I believe this is where the biggest debate lies: whether "for" should be >>> non-destructive. I realize we are currently on the other side of the >>> fence, but i foresee enough potential pain that i would like you to >>> consider the value of keeping "for" loops non-destructive. >> >> Consider >> >> for line in f.readlines(): >> >> in any version of Python. > > and? for-in doesn't modify the object returned > by f.readlines(), and never has. While technically true, that seems to be sidestepping the point from my POV. I think that few people see for loops as inherently non-destructive due to the use case I presented above. Beyond that, the for loop is itself inherently mutating in Python older than 2.2, which I see as functionally equivalent to "destructive"; the primary intention of iterators (from my recollections of the tenor of the discussions) was to package that mutating state in a way that could capture the iterability of objects other than sequences. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From Paul.Moore@atosorigin.com Fri Jul 19 16:28:11 2002 From: Paul.Moore@atosorigin.com (Moore, Paul) Date: Fri, 19 Jul 2002 16:28:11 +0100 Subject: [Python-Dev] Single- vs. Multi-pass iterability Message-ID: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> Ka-Ping Yee writes: > It's just not the way i expect for-loops to work. Perhaps we > would need to survey people for objective data, but i feel > that most people would be surprised if > > for x in y: print x > for x in y: print x > > did not print the same thing twice, or if Overall, I think I would say "it depends". Barry pointed out that it depends on the type of y. That's what I mean, although my intuition isn't quite that specific by itself. By the way, not all languages that I am aware of even have "for ... in" constructs. Perl does, and Visual Basic does. C and C++ don't. In Perl, "for $x (<>)" or whatever magic line noise Perl uses, does the same as Python's "for line in f", so the same non-repeatable for issue exists there (at least for files, and I *bet* you can do nasty things with tied variables to have it happen elsewhere, too). Even in Visual Basic, "for each x in obj" can in theory do anything (depending on the type of obj), much like Python. So I think that existing practice goes against your expectation. There *is* an issue of some sort with being able to find out whether a given object offers reproducible for behaviour in the way you describe above. The problem is determining real-world cases where knowing is useful. There are a lot of theoretical issues here, but few simple, comprehensible, practical use cases. FWIW, - I'm +1 for renaming next() to __next__(). - I'm +0 on dropping the requirements that iterators *must* implement __iter__() (as per your description of the 2 orthogonal proposals). I'd like to see iterators strongly advised to implement __iter__() as returning self (and all built in ones doing so), but not have it mandated. - I'm -1 on your for...from syntax. Hope this helps, Paul. From barry@zope.com Fri Jul 19 16:36:45 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 11:36:45 -0400 Subject: [Python-Dev] The iterator story References: Message-ID: <15672.12813.512623.968270@anthem.wooz.org> Nice write-up Ka-Ping. Maybe you need to transform this into a PEP called Iterators.next() 1/2 :) -Barry From jafo-python-dev@tummy.com Fri Jul 19 16:43:03 2002 From: jafo-python-dev@tummy.com (Sean Reifschneider) Date: Fri, 19 Jul 2002 09:43:03 -0600 Subject: [Python-Dev] Judy for replacing internal dictionaries? Message-ID: <20020719094303.B24220@tummy.com> Recently at a Hacking Society meeting someone was working on packaging Judy for Debian. Apparently, Judy is a data-structure designed by some researchers at Hewlett-Packard. It's goal is to be a very fast implementation of an associative array or (possibly sparse) integer indexed array. Judy has recently been released under the LGPL. After reding the FAQ and 10 minute introduction, I started wondering about wether it could improve the overall performance of Python by replacing dictionaries used for namespaces, classes, etc... Since then, I've realized that I probably won't have time to do the implementation any time soon, and Evelyn urged me to bring it up here. I realize that Python's dictionaries are fairly well optimized. It sounds like Judy may be even faster though. It apparently works fairly hard at reducing L2 cache misses, for example. Some URLs: Judy FAQ: http://atwnt909.external.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,1949,00.html Judy 10 minute introduction: http://atwnt909.external.hp.com/dspp/ddl/ddl_Download_File_TRX/1,1249,702,00.pdf SourceForge Project Page: http://sourceforge.net/projects/judy/ Sean -- YOU ARE WITNESSING A FRONT THREE-QUARTER VIEW OF TWO ADULTS SHARING A TENDER MOMENT. -- Gordon Cole, _Twin_Peaks_ Sean Reifschneider, Inimitably Superfluous tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python From fredrik@pythonware.com Fri Jul 19 17:07:21 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 19 Jul 2002 18:07:21 +0200 Subject: [Python-Dev] Single- vs. Multi-pass iterability References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> <017a01c22f32$865123d0$0900a8c0@spiff> <20020719152029.GA18810@panix.com> Message-ID: <001b01c22f3e$5e25ab40$0900a8c0@spiff> aahz wrote: > While technically true, that seems to be sidestepping the point from = my > POV. really? are you arguing that when Ping says that for-in shouldn't destroy the target, he's really saying that python shouldn't allow methods to have side effects if they can be called from an expression used in a for-in statement? why would he say that? > I think that few people see for loops as inherently non-destructive > due to the use case I presented above. I think most people can tell the difference between an object and a method with side-effects. I doubt they would be able to get much done in Python if they couldn't. > Beyond that, the for loop is itself inherently mutating in Python > older than 2.2 in what sense? it calls the object's __getitem__ method with an integer index value, until it gets an IndexError. in what way is that "inherently mutating"? From martin@v.loewis.de Fri Jul 19 17:09:40 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 19 Jul 2002 18:09:40 +0200 Subject: [Python-Dev] staticforward In-Reply-To: <3D381714.7040606@lemburg.com> References: <3D35A188.20407@lemburg.com> <15669.47553.15097.651868@slothrop.zope.com> <3D35D466.5090903@lemburg.com> <200207172045.g6HKjBg13729@odiug.zope.com> <3D35DA67.8060206@lemburg.com> <3D35DBB9.9000103@lemburg.com> <15670.62611.943840.954629@slothrop.zope.com> <3D371361.7050908@lemburg.com> <15671.6078.577033.943393@slothrop.zope.com> <3D372E1D.50009@lemburg.com> <15671.12313.725886.680036@slothrop.zope.com> <3D373573.8070001@lemburg.com> <15671.15105.563068.700997@slothrop.zope.com> <3D3741D4.8020408@lemburg.com> <15671.16894.185299.672286@slothrop.zope.com> <3D37CE76.4020803@lemburg.com> <200207191259.g6JCxGp24808@pcp02138704pcs.reston01.va.comcast.net> <3D381714.7040606@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > """ > Albert Chin-A-Young wrote on 2002-05-04: > > > > > > The AIX xlc ANSI compiler does not allow forward declaration of > > > variables. This leads to a lot of problems with .c files that use > > > staticforward (e.g. mxDateTime.c, mxProxy.c, etc.). Any chance of > > > fixing these? > """ > > I'm not making this up. Yes, but the user might be. I don't believe this statement is factually correct - the compiler most certainly does allow forward declaration of variables. Also, such a statement is of little value unless associated with an operating system release number (or better a compiler release number). This conversation snippet indicates that the problem has not been fully understood (atleast by Albert Chin-A-Young); solving an incompletely-understood problem is a recipe for desasters, when it comes to portability. Regards, Martin From pinard@iro.umontreal.ca Fri Jul 19 17:02:10 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 19 Jul 2002 12:02:10 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> Message-ID: [Moore, Paul] > - I'm +0 on dropping the requirements that iterators *must* > implement __iter__() (as per your description of the 2 > orthogonal proposals). In Ka-Ping's letter, I did not read that the proposals were orthogonal. __iter__ would not be required anymore to identify an iterator as such, because __next__ would be sufficient, alone, for this purpose. That would have the effect of cleaning up the iterator protocol from the double constraint it currently has, and probably makes things clearer as well. > I'd like to see iterators strongly advised to implement __iter__() as > returning self Strong advice should not be merely given "ex cathedra", there should be some kind of (convincing) justification behind it. It makes sense for generators at least, so they could be used in a few places where Python expects containers to provide their iterator. The justification is more fuzzy outside generators, especially when programmers do not see the need of obtaining an iterator from itself, the usual and only case I see right now is resuming an iterator which has not bee fully consumed. Ka-Ping also stresses, indirectly, that `element in iterator' (resuming an iterator instead of obtaining a new one from a container) could have a strange meaning, and might even represent a user error. I even wonder if it would not be wise to have iterators _not_ defining an __iter__ method! -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Fri Jul 19 17:30:43 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 12:30:43 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 12:02:10 EDT." References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> Message-ID: <200207191630.g6JGUh626683@pcp02138704pcs.reston01.va.comcast.net> > In Ka-Ping's letter, I did not read that the proposals were orthogonal. > __iter__ would not be required anymore to identify an iterator as such, > because __next__ would be sufficient, alone, for this purpose. That would > have the effect of cleaning up the iterator protocol from the double > constraint it currently has, and probably makes things clearer as well. I think there's been some confusion. I never intended the test for "is this an iterator" to be "does it have a next() and an __iter__() method". I *do* strongly advise iterators to define __iter__(), but only because I expect that "for x in iterator:" is useful in iterator algebra functions and the like. In fact, I don't really think that Python currently has foolproof ways to test for *any* kind of abstract protocol. Questions like "Is x a mapping" or "is x a sequence" are equally impossible to answer. The recommended approach is simply to go ahead and use something; if it doesn't obey the protocol, it will fail. Of course, you should *document* the requirements (e.g., "argument x should be a sequence), but I've always considered it a case of LBYL syndrome if code wants to check first. Note that you can't write code that does something different for a sequence than for a mapping; for example, the following class could be either: class C: def __getitem__(self, i): return i I realize that this won't make David Abrahams and his Boost users happy, but that's how Python has approached this issue since its inception. I'm fine with suggestions that we should really fix this; I expect that some way to assert interfaces or protocols will eventually find its way into the language. But I *don't* think that the current inability to test for iterator-ness (or iterable-ness, or multi-iteratable-ness, etc.) should be used as an argument that there's anything wrong with the iterator protocol. (And I've *still* not read Ping's original message...) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Jul 19 17:50:47 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 19 Jul 2002 12:50:47 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <20020719141658.GA7919@panix.com> Message-ID: [Tim] > This patch is a Good Thing, and I demand that everyone show [MarkH] more > appreciation for it. [Aahz] > If I still used Windoze for anything, I would. Then you missed the point of the patch. My demand stands unabated. relentlessly y'rs - tim From aahz@pythoncraft.com Fri Jul 19 17:58:37 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 12:58:37 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <001b01c22f3e$5e25ab40$0900a8c0@spiff> References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> <017a01c22f32$865123d0$0900a8c0@spiff> <20020719152029.GA18810@panix.com> <001b01c22f3e$5e25ab40$0900a8c0@spiff> Message-ID: <20020719165836.GA14402@panix.com> On Fri, Jul 19, 2002, Fredrik Lundh wrote: > aahz wrote: >> >> While technically true, that seems to be sidestepping the point from my >> POV. > > really? are you arguing that when Ping says that for-in shouldn't > destroy the target, he's really saying that python shouldn't allow > methods to have side effects if they can be called from an > expression used in a for-in statement? why would he say that? I'm saying that I think Ping is overstating the case in terms of the way people look at things. Whatever the technicalities of an implicit method versus an explicit method, people have long used for loops in destructive ways. >> I think that few people see for loops as inherently non-destructive >> due to the use case I presented above. > > I think most people can tell the difference between an object and > a method with side-effects. I doubt they would be able to get much > done in Python if they couldn't. To be sure. But I don't think there's much difference in the way for loops are actually used. Continuing my point above, I see the current usage of for loops as calling an implicit method with side-effects as opposed to an explicit method with side-effects. Lo and behold! That's actually the case. >> Beyond that, the for loop is itself inherently mutating in Python >> older than 2.2 > > in what sense? it calls the object's __getitem__ method with an > integer index value, until it gets an IndexError. in what way is that > "inherently mutating"? And how does that integer index change? The for loop in Python <2.2 has an internal state object. Iterators are the external manifestation of that state object, generalized to objects other than sequences. I'm surprised that anyone is surprised that the state object gets mutated/destroyed. I'm also surprised that people are surprised about what happens when that state object is coupled to an inherently mutating object such as file objects. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From barry@zope.com Fri Jul 19 18:07:29 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 13:07:29 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability References: <200207181422.g6IEMBr14526@odiug.zope.com> <20020719142349.GA9051@panix.com> <017a01c22f32$865123d0$0900a8c0@spiff> <20020719152029.GA18810@panix.com> <001b01c22f3e$5e25ab40$0900a8c0@spiff> <20020719165836.GA14402@panix.com> Message-ID: <15672.18257.829735.736033@anthem.wooz.org> >>>>> "A" == Aahz writes: A> The for loop in Python <2.2 has an internal state object. A> Iterators are the external manifestation of that state object, A> generalized to objects other than sequences. I'm surprised A> that anyone is surprised that the state object gets A> mutated/destroyed. I'm also surprised that people are A> surprised about what happens when that state object is coupled A> to an inherently mutating object such as file objects. Well said. -Barry From aahz@pythoncraft.com Fri Jul 19 18:02:20 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 13:02:20 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: References: <20020719141658.GA7919@panix.com> Message-ID: <20020719170220.GB14402@panix.com> On Fri, Jul 19, 2002, Tim Peters wrote: > > [Tim] > > This patch is a Good Thing, and I demand that everyone show [MarkH] more > > appreciation for it. > > [Aahz] > > If I still used Windoze for anything, I would. > > Then you missed the point of the patch. My demand stands unabated. All right, then, I hereby show MarkH ill understood appreciation. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From David Abrahams" <200207191630.g6JGUh626683@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> From: "Guido van Rossum" > > In Ka-Ping's letter, I did not read that the proposals were orthogonal. > > __iter__ would not be required anymore to identify an iterator as such, > > because __next__ would be sufficient, alone, for this purpose. That would > > have the effect of cleaning up the iterator protocol from the double > > constraint it currently has, and probably makes things clearer as well. > > I think there's been some confusion. I never intended the test for > "is this an iterator" to be "does it have a next() and an __iter__() > method". Do you intend to have a test for "is this an iterator" at all? > I *do* strongly advise iterators to define __iter__(), but > only because I expect that "for x in iterator:" is useful in iterator > algebra functions and the like. Makes sense. > In fact, I don't really think that Python currently has foolproof ways > to test for *any* kind of abstract protocol. Questions like "Is x a > mapping" or "is x a sequence" are equally impossible to answer. True. > The recommended approach is simply to go ahead and use something; if > it doesn't obey the protocol, it will fail. Of course, you should > *document* the requirements (e.g., "argument x should be a sequence), > but I've always considered it a case of LBYL syndrome if code wants to > check first. If LBYL is bad, what is introspection good for? > Note that you can't write code that does something > different for a sequence than for a mapping; for example, the > following class could be either: > > class C: > def __getitem__(self, i): return i > > I realize that this won't make David Abrahams and his Boost users > happy, but that's how Python has approached this issue since its > inception. I understand that that's always been "the Python way". However, isn't there also some implication that some of the special functions are more than just a way to provide implementations of Python's syntax? Notes in the docs like those on __getitem__ tend to argue for that, at least by convention. Unless I'm misinterpreting things, "the Python way" isn't quite so one-sided where protocols are concerned. > I'm fine with suggestions that we should really fix this; I expect > that some way to assert interfaces or protocols will eventually find > its way into the language. > > But I *don't* think that the current inability to test for > iterator-ness (or iterable-ness, or multi-iteratable-ness, etc.) > should be used as an argument that there's anything wrong with the > iterator protocol. Just for the record, I never meant to imply that it was broken, only that I'd like to get a little more from it than I currently can. -Dave From paul-python@svensson.org Fri Jul 19 18:21:26 2002 From: paul-python@svensson.org (Paul Svensson) Date: Fri, 19 Jul 2002 13:21:26 -0400 (EDT) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <20020719165836.GA14402@panix.com> Message-ID: On Fri, 19 Jul 2002, Aahz wrote: >And how does that integer index change? The for loop in Python <2.2 has >an internal state object. Iterators are the external manifestation of >that state object, generalized to objects other than sequences. I'm >surprised that anyone is surprised that the state object gets >mutated/destroyed. I'm also surprised that people are surprised about >what happens when that state object is coupled to an inherently mutating >object such as file objects. All the surprises I see stem from confusion between what is the object being iterated over, and what is the object holding the state of the iteration. Iterators returning self for __iter__() is the major cause of this confusion. I agree that in the general case, the boundary may not always be clear, but Ping's proposal cleans up what's seen 99.9% of the time. Pending the pain of the yet unseen migration plan, I'm +1 on removing __iter__ from all core iterators +1 on renaming next() to __next__() +1 on presenting file objects as iterators rather than iterables +0 on the new 'for x from y' syntax /Paul From guido@python.org Fri Jul 19 18:23:19 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 13:23:19 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 13:16:27 EDT." <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> <200207191630.g6JGUh626683@pcp02138704pcs.reston01.va.comcast.net> <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> Message-ID: <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> > Do you intend to have a test for "is this an iterator" at all? Not right now, see the rest of my email. The best you can do is check for a next method and hope for the best. > If LBYL is bad, what is introspection good for? Ask Alex. > I understand that that's always been "the Python way". However, > isn't there also some implication that some of the special functions > are more than just a way to provide implementations of Python's > syntax? Like what? > Notes in the docs like those on __getitem__ tend to argue > for that, at least by convention. Unless I'm misinterpreting > things, "the Python way" isn't quite so one-sided where protocols > are concerned. Can you quote specific places in the docs you read this way? I don't see it, but I've only scanned chapter 3 of the Language Reference Manual. > Just for the record, I never meant to imply that it was broken, only > that I'd like to get a little more from it than I currently can. Maybe I should read Ping's email. From the discussion I figured he was arguing this way. I think you have to settle with what I proposed at the top. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 18:32:19 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 13:32:19 -0400 Subject: [Python-Dev] Where's time.daylight??? In-Reply-To: Your message of "Fri, 19 Jul 2002 13:13:40 EDT." <15672.18628.831787.897474@anthem.wooz.org> References: <15672.18628.831787.897474@anthem.wooz.org> Message-ID: <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> [Barry, in python-checkins] > I've noticed one breakage already I believe. On my systems (RH6.1 and > RH7.3) time.daylight as disappeared. > > I don't think test_time.py actually tests this parameter, but > test_email.py which is what's failing for me: [...] Yup, time.daylight has disappeared. But the bizarre thing is that if I roll back to rev. 1.129, it's *still* gone! Even rev 1.128 still doesn't fix this. I wonder if something in configure changed??? --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax@aleax.it Fri Jul 19 18:35:53 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 19 Jul 2002 19:35:53 +0200 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Friday 19 July 2002 07:23 pm, Guido van Rossum wrote: > > Do you intend to have a test for "is this an iterator" at all? > > Not right now, see the rest of my email. The best you can do is check > for a next method and hope for the best. > > > If LBYL is bad, what is introspection good for? > > Ask Alex. Introspection is good when you need to dispatch in a way that is not supported by the language you're using. In Python (and most other languages), this mostly mean multiple dispatch -- you don't get it from the language, therefore, on the non-frequent occasions when you NEED it, you have to kludge it up. Very similar to multiple inheritance in languages that don't support THAT, really. (Particularly in how people who've never used multiple X don't really understand that it buys you anything -- try interesting a dyed-in-the-wool Smalltalker in multiple inheritance, or anybody *but* a CLOS-head or Dylan-head in multiple dispatch...:-). Other aspects of introspection help you implement other primitives lacking in the language. E.g. "make another like myself but not initialized" can be self.__class__.__new__(self.__class__) -- not the most elegant expression, but, hey, I've seen worse (such as NOT being able to express it at all, in languages lacking the needed ability to introspect:-). Looking at *ANOTHER* object this way isn't really INTROspection, btw -- it's EXTRAspection, by the Latin roots of these words:-). Alex From tim.one@comcast.net Fri Jul 19 18:36:47 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 19 Jul 2002 13:36:47 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <20020719170220.GB14402@panix.com> Message-ID: [Aahz] > All right, then, I hereby show MarkH ill understood appreciation. Excellent! One down, about two hundred thousand to go. From David Abrahams" <200207191630.g6JGUh626683@pcp02138704pcs.reston01.va.comcast.net> <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0e4201c22f4b$840d44f0$6501a8c0@boostconsulting.com> From: "Guido van Rossum" > > Do you intend to have a test for "is this an iterator" at all? > > Not right now, see the rest of my email. The best you can do is check > for a next method and hope for the best. I only asked because the rest of your email seemed to imply that you didn't believe in such checks at this time, while the sentence above my question seemed to imply there is/should be such a test. Thanks for clarifying. > > If LBYL is bad, what is introspection good for? > > Ask Alex. OK. Alex, what's introspection good for? > > I understand that that's always been "the Python way". However, > > isn't there also some implication that some of the special functions > > are more than just a way to provide implementations of Python's > > syntax? > > Like what? > > > Notes in the docs like those on __getitem__ tend to argue > > for that, at least by convention. Unless I'm misinterpreting > > things, "the Python way" isn't quite so one-sided where protocols > > are concerned. > > Can you quote specific places in the docs you read this way? Just for example: __getitem__: "For sequence types, the accepted keys should be integers and slice objects. .... If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. Note: for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence. " __delitem__: "Same note as for __getitem__(). This should only be implemented for mappings if the objects support removal of keys, or for sequences if elements can be removed from the sequence. The same exceptions should be raised for improper key values as for the __getitem__() method." __iter__: "This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container, and should also be made available as the method iterkeys()." The way I read these, the behavior of an implementation of these functions isn't really open-ended. It ought to follow certain conventions, if you want your type to behave sensibly. And that's about as strong as any legislation I've seen anywhere in the Python docs. > I don't > see it, but I've only scanned chapter 3 of the Language Reference > Manual. > > > Just for the record, I never meant to imply that it was broken, only > > that I'd like to get a little more from it than I currently can. > > Maybe I should read Ping's email. From the discussion I figured he > was arguing this way. I think you have to settle with what I proposed > at the top. Of course I do; I never expected otherwise. Like most of my other suggestions, this is a case of "OK, whatever you say Guido... but as long as people are interested in discussing the issues I'd like them to understand my reasons for bringing it up". -Dave From David Abrahams" <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <0e5201c22f4c$37c62e30$6501a8c0@boostconsulting.com> From: "Alex Martelli" > Introspection is good when you need to dispatch in a way that is > not supported by the language you're using. In Python (and most > other languages), this mostly mean multiple dispatch -- you don't > get it from the language, therefore, on the non-frequent occasions > when you NEED it, you have to kludge it up. Very similar to > multiple inheritance in languages that don't support THAT, really. > > (Particularly in how people who've never used multiple X don't > really understand that it buys you anything -- try interesting a > dyed-in-the-wool Smalltalker in multiple inheritance, or anybody > *but* a CLOS-head or Dylan-head in multiple dispatch...:-). Ahem. *I'm* interested in multiple-dispatch (never used CLOS or Dylan). You might not have noticed that I mentioned multimethods in my post about supporting overloading in Boost.Python. > Other aspects of introspection help you implement other primitives > lacking in the language. E.g. "make another like myself but not > initialized" can be self.__class__.__new__(self.__class__) -- not > the most elegant expression, but, hey, I've seen worse (such as > NOT being able to express it at all, in languages lacking the > needed ability to introspect:-). Is that really introspection? It doesn't seem to ask a question. > Looking at *ANOTHER* object this way isn't really INTROspection, > btw -- it's EXTRAspection, by the Latin roots of these words:-). Okay. I hope you won't be offended if I continue to use the wrong term so that everyone else can understand me ;-) -Dave From tim.one@comcast.net Fri Jul 19 18:50:19 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 19 Jul 2002 13:50:19 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: [Ping] > ... > I believe this is where the biggest debate lies: whether "for" should be > non-destructive. I realize we are currently on the other side of the > fence, but i foresee enough potential pain that i would like you to > consider the value of keeping "for" loops non-destructive. I'm having a hard time getting excited about this. If you had made this argument before the iterator protocol was implemented, it may have been more or less intriguing. But it was implemented and released some time ago, and I just haven't seen any evidence of such problems on c.l.py, the Help list, or the Tutor list (all of which I still pay significant attention to). "for" did and does work in accord with a simple protocol, and whether that's "destructive" depends on how the specific objects involved implement their pieces of the protocol, not on the protocol itself. The same is true of all of Python's hookable protocols. What's so special about "for" that it should pretend to deliver purely functional behavior in a highly non-functional language? State mutates. That's its purpose . From aahz@pythoncraft.com Fri Jul 19 18:54:56 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 13:54:56 -0400 (EDT) Subject: [Python-Dev] CANCEL: OSCON Community dinner Weds 7/24 6pm References: Message-ID: <200207191754.g6JHsuV00747@panix1.panix.com> Given the lack of response, I'm hereby canceling any official Python community dinner. I hope to see many of you at the conference, though. I'm including the original message below in case someone else wants to run with the ball. In article , Aahz wrote: >[posted to c.l.py with cc to c.l.py.announce and python-dev] > >I'm proposing a Python community dinner at OSCON next week, for Weds >7/24 at 6pm. Is there anyone familiar with the San Diego area who wants >to suggest a location near the Sheraton? If I don't get any >recommendations, we'll probably just have the dinner at the Sheraton. > >If you're interested, please send me an e-mail so I have some idea of >the number of people. Also, please include a way of getting in touch >with you at OSCON in case plans change (phone numbers accepted, but >e-mail addresses preferred). > >(There's a meeting for PSF members at 8pm, so some of us will likely >have to skip out early.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ -- From guido@python.org Fri Jul 19 19:08:57 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 14:08:57 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 13:50:19 EDT." References: Message-ID: <200207191808.g6JI8wE28214@pcp02138704pcs.reston01.va.comcast.net> > I'm having a hard time getting excited about this. If you had made > this argument before the iterator protocol was implemented, it may > have been more or less intriguing. But it was implemented and > released some time ago, and I just haven't seen any evidence of such > problems on c.l.py, the Help list, or the Tutor list (all of which I > still pay significant attention to). This is an important argument IMO that the theorists here seem to be missing somewhat. Releasing a feature and monitoring feedback is a good way of user testing, something that has been ignored too often by language designers. Elegant or minimal abstractions have their place; but in the end, users are more important. Quoting Steven Pemberton's home page (http://www.cwi.nl/~steven/): ABC: Simple but Powerful Interactive Programming Language and Environment. : A Simple but Powerful Interactive Programming Language and Environment. We did requirements and task analysis, iterative design, and user testing. You'd almost think programming languages were an interface between people and computers. Now famous because Python was strongly influenced by it. I still favor this approach to language design. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 19:15:45 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 14:15:45 -0400 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 13:41:24 EDT." <0e4201c22f4b$840d44f0$6501a8c0@boostconsulting.com> References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> <200207191630.g6JGUh626683@pcp02138704pcs.reston01.va.comcast.net> <0e2201c22f48$05ff1870$6501a8c0@boostconsulting.com> <200207191723.g6JHNJf27635@pcp02138704pcs.reston01.va.comcast.net> <0e4201c22f4b$840d44f0$6501a8c0@boostconsulting.com> Message-ID: <200207191815.g6JIFja28258@pcp02138704pcs.reston01.va.comcast.net> > The way I read these, the behavior of an implementation of these > functions isn't really open-ended. It ought to follow certain > conventions, if you want your type to behave sensibly. And that's > about as strong as any legislation I've seen anywhere in the Python > docs. Note the qualification: "if you want your type to behave sensibly". You can interpret the paragraphs you quoted as explaining what makes a good sequence or mapping. IOW they hint at some of the invariants of those protocols. But I wouldn't call this legislation. > Of course I do; I never expected otherwise. Like most of my other > suggestions, this is a case of "OK, whatever you say Guido... but as > long as people are interested in discussing the issues I'd like them > to understand my reasons for bringing it up". Maybe I should just tune out of this discussion if it's only of theoretical importance? --Guido van Rossum (home page: http://www.python.org/~guido/) From trentm@ActiveState.com Fri Jul 19 19:26:02 2002 From: trentm@ActiveState.com (Trent Mick) Date: Fri, 19 Jul 2002 11:26:02 -0700 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: ; from tim.one@comcast.net on Fri, Jul 19, 2002 at 01:36:47PM -0400 References: <20020719170220.GB14402@panix.com> Message-ID: <20020719112602.A17763@ActiveState.com> [Tim Peters wrote] > Excellent! One down, about two hundred thousand to go. Mark rocks! 1,999,999-ly, Trent -- Trent Mick TrentM@ActiveState.com From aahz@pythoncraft.com Fri Jul 19 19:29:22 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 14:29:22 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: References: <20020719165836.GA14402@panix.com> Message-ID: <20020719182922.GA9585@panix.com> On Fri, Jul 19, 2002, Paul Svensson wrote: > > Pending the pain of the yet unseen migration plan, I'm > +1 on removing __iter__ from all core iterators > +1 on renaming next() to __next__() > +1 on presenting file objects as iterators rather than iterables > +0 on the new 'for x from y' syntax I'd vote this way: -0 on removing __iter__ +1 on renaming next() to __next__() +0 on presenting file objects as iterators +1 on finishing up the patch that fixes the xreadlines() mess -1 on for x from y -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From aahz@pythoncraft.com Fri Jul 19 19:30:30 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 14:30:30 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <20020719112602.A17763@ActiveState.com> References: <20020719170220.GB14402@panix.com> <20020719112602.A17763@ActiveState.com> Message-ID: <20020719183029.GB9585@panix.com> On Fri, Jul 19, 2002, Trent Mick wrote: > [Tim Peters wrote] >> >> Excellent! One down, about two hundred thousand to go. > > Mark rocks! > > 1,999,999-ly, Next up: MarkH writes a patch to fix Trent's arithmetic. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From tim.one@comcast.net Fri Jul 19 19:39:38 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 19 Jul 2002 14:39:38 -0400 Subject: [Python-Dev] Judy for replacing internal dictionaries? In-Reply-To: <20020719094303.B24220@tummy.com> Message-ID: [Sean Reifschneider] > Recently at a Hacking Society meeting someone was working on > packaging Judy for Debian. Apparently, Judy is a data-structure > designed by some researchers at Hewlett-Packard. It's goal is to > be a very fast implementation of an associative array or > (possibly sparse) integer indexed array. > > Judy has recently been released under the LGPL. > > After reding the FAQ and 10 minute introduction, I started wondering > about wether it could improve the overall performance of Python by > replacing dictionaries used for namespaces, classes, etc... Sorry, almost certainly not. In a typical Python namespace lookup, the pure overheads of calling and returning from the lookup function cost more than doing the lookup. Python dicts are more optimized for this use than you realize. Judy looks like it would be faster than Python dicts for large mappings, though (and given the boggling complexity of Judy's data structures, it damn well better be ). As a general replacement for Python dicts, it wouldn't fly because it requires a total ordering on keys, and an ordering explicitly given by bitstrings, not implicitly via calls to an opaque ordering function. Looks like it may be an excellent alternative to in-memory B-Trees keyed by manifest bitstrings (like ints and character strings or even addresses). From nas@python.ca Fri Jul 19 20:00:43 2002 From: nas@python.ca (Neil Schemenauer) Date: Fri, 19 Jul 2002 12:00:43 -0700 Subject: [Python-Dev] The iterator story In-Reply-To: ; from ping@zesty.ca on Fri, Jul 19, 2002 at 04:28:32AM -0700 References: Message-ID: <20020719120043.A21503@glacier.arctrix.com> Ka-Ping Yee wrote: > I think "for" should be non-destructive because that's the way > it has almost always behaved, and that's the way it behaves in > any other language [@] i can think of. I agree that it can be surprising to have "for" destory the object it's looping over. I myself was bitten once by it. I'm not yet sure if this is something that will repeatedly bite. I suspect it might. :-( > And as things stand, the presence of __iter__ doesn't even work [@] > as a type flag. __iter__ is not a flag. When you want to loop over an object you call __iter__ to get an iterator. Since you should be able to loop over all iterators they should provide a __iter__ that returns self. > Now suppose we agree that __iter__ and next are distinct protocols. I suppose you can call them distinct but they both pertain to iteration. One gets the iterator, the other uses it. > Then why require iterators to support both? The only reason we > would want __iter__ on iterators is so that we can use "for" [@] > with an iterator as the second operand. Isn't that a good reason? It's not just "for" though. Anytime you have an object that you want to loop over you should call iter() to get an iterator and then call .next() on that object. > I think the potential for collision, though small, is significant, > and this makes "__next__" a better choice than "next". When this issue originally came up, my position was that double underscores should be used only if there is a risk of of namespace collision. The fact that the method was stored on a type slot is irrelevant. If objects implement iterators as a separate, specialized object there wouldn't be any namespace collisions. Now it looks like people want to have iterators that also do other things. In that case, __next__ would have been a better choice. > The connection between this issue and the __iter__ issue is that, > if next() were renamed to __next__(), the argument that __iter__ > is needed as a flag would also go away. Sorry, I don't see the connection. __iter__ is not a flag. How does renaming next() help? > In my ideal world, we would allow a new form of "for", such as > > for line from file: > print line Nice syntax but I think it creates other problems. Basically, you are saying that iterators should not implement __iter__ and we should have some other way of looping over them (in order to make it clear that they are being mutated). First, people could implement __iter__ such that it returns an iterator the mutates the original object (e.g. a file object __iter__ that returns xreadlines). Second, it will be confusing to have two different ways of looping over things. Imagine a library with this bit of code: for item in sequence: do something Now I want to use this library but I have an iterator, not something that implements __iter__. I would need to create a little wrapper with a __iter__ method that returns my object. Should people prefer to write: for item from iterator: do something when they only need to loop over something once? Doing so makes the code most generally useful. What about functions like map() and max()? Should they accept iterators or sequences as arguments? It would be confusing if some functions accepted iterators as arguments but not "container" objects (i.e. things that implement __iter__) and vice versa. People will wonder if they should call iter() before passing their sequence as an argument. To summarize, I agree that "for" mutating the object can be surprising. I don't think that removing the __iter__ from iterators is the right solution. Unfortunately I don't have any alternative suggestions. Neil From aleax@aleax.it Fri Jul 19 19:55:06 2002 From: aleax@aleax.it (Alex Martelli) Date: Fri, 19 Jul 2002 20:55:06 +0200 Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability In-Reply-To: <0e5201c22f4c$37c62e30$6501a8c0@boostconsulting.com> References: <714DFA46B9BBD0119CD000805FC1F53B01B5B462@UKRUX002.rundc.uk.origin-it.com> <0e5201c22f4c$37c62e30$6501a8c0@boostconsulting.com> Message-ID: On Friday 19 July 2002 07:45 pm, David Abrahams wrote: ... > > dyed-in-the-wool Smalltalker in multiple inheritance, or anybody > > *but* a CLOS-head or Dylan-head in multiple dispatch...:-). > > Ahem. *I'm* interested in multiple-dispatch (never used CLOS or Dylan). You > might not have noticed that I mentioned multimethods in my post about > supporting overloading in Boost.Python. Sorry, I hadn't noticed. I never did production work in CLOS or Dylan, either, so I guess that enough C++ and templates warp one's brain enough to increase ones' perceptivity (only way to account for both of us:-). > > Other aspects of introspection help you implement other primitives > > lacking in the language. E.g. "make another like myself but not > > initialized" can be self.__class__.__new__(self.__class__) -- not > > the most elegant expression, but, hey, I've seen worse (such as > > NOT being able to express it at all, in languages lacking the > > needed ability to introspect:-). > > Is that really introspection? It doesn't seem to ask a question. "What is this concrete object's actual runtime class?" is a question, even though it may not look like one since the answer is in a special attribute rather than being obtained from a method call. Feel free to code type(self) instead of self.__class__ if this feels more question-ish, of course. Six of one, half a dozen of the other. The object is "looking inside itself" -> introspection. Specifically, looking as its own metadata. > > Looking at *ANOTHER* object this way isn't really INTROspection, > > btw -- it's EXTRAspection, by the Latin roots of these words:-). > > Okay. I hope you won't be offended if I continue to use the wrong term so > that everyone else can understand me ;-) How depressingly pragmatic. Alex From guido@python.org Fri Jul 19 20:10:30 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 15:10:30 -0400 Subject: [Python-Dev] Where's time.daylight??? In-Reply-To: Your message of "Fri, 19 Jul 2002 13:32:19 EDT." <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> > [Barry, in python-checkins] > > I've noticed one breakage already I believe. On my systems (RH6.1 and > > RH7.3) time.daylight as disappeared. > > > > I don't think test_time.py actually tests this parameter, but > > test_email.py which is what's failing for me: > [...] > > Yup, time.daylight has disappeared. But the bizarre thing is that if > I roll back to rev. 1.129, it's *still* gone! Even rev 1.128 still > doesn't fix this. I wonder if something in configure changed??? Alas, this is the effect of defining _XOPEN_SOURCE in configure.in. This somehow has the effect of not defining these symbols in pyconfig.h: HAVE_STRUCT_TM_TM_ZONE HAVE_TM_ZONE HAVE_TZNAME I'm going to remove the _XOPEN_SOURCE define; Jeremy and Martin can try to figure out what the right thing is for Tru64. --Guido van Rossum (home page: http://www.python.org/~guido/) From andymac@bullseye.apana.org.au Fri Jul 19 14:32:18 2002 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Sat, 20 Jul 2002 00:32:18 +1100 (edt) Subject: [Python-Dev] test_socket failure on FreeBSD In-Reply-To: <200207181627.g6IGRPE21459@odiug.zope.com> Message-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---888574994-29658-1027085538=:42796 Content-Type: TEXT/PLAIN; charset=US-ASCII On Thu, 18 Jul 2002, Guido van Rossum wrote: {...} > > Testing recvfrom() in chunks over TCP. ... > > seg1='Michael Gilfix was he', addr='None' > > seg2='re > > ', addr='None' > > ERROR > > Hm. This looks like recvfrom() on a TCP stream doesn't return an > address; not entirely unreasonable. I wonder if > self.cli_conn.getpeername() returns the expected address; can you > check this? Add this after each recvfrom() call. > > if addr is None: > addr = self.cli_conn.getpeername() This appears to have the effect you desired. See the attached log. {...} > > Testing non-blocking accept. ... > > conn= > > addr=('127.0.0.1', 3144) > > FAIL > > This is different. It seems that the accept() call doesn't time out. > But this could be because the client thread connects too fast. Can > you add a sleep (e.g. time.sleep(5)) to _testAccept() before the > connect() call? Likewise. I took the sleep down to 1ms without failure, though that system has HZ=100 so std resolution I expect would be 10ms. I have also attached for info the log of the same modifications on EMX - situation improved, but still a hiccup there. Also attached is the diff I applied to test_socket.py (as of about 1900 UTC 020719). -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia ---888574994-29658-1027085538=:42796 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="test_socket.log.fbsd44" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: test_socket.log.fbsd44 Content-Disposition: attachment; filename="test_socket.log.fbsd44" dGVzdF9zb2NrZXQNClRlc3RpbmcgZm9yIG1pc3Npb24gY3JpdGljYWwgY29u c3RhbnRzLiAuLi4gb2sNClRlc3RpbmcgZGVmYXVsdCB0aW1lb3V0LiAuLi4g b2sNClRlc3RpbmcgZ2V0c2VydmJ5bmFtZSgpLiAuLi4gb2sNClRlc3Rpbmcg Z2V0c29ja29wdCgpLiAuLi4gb2sNClRlc3RpbmcgaG9zdG5hbWUgcmVzb2x1 dGlvbiBtZWNoYW5pc21zLiAuLi4gb2sNCk1ha2luZyBzdXJlIGdldG5hbWVp bmZvIGRvZXNuJ3QgY3Jhc2ggdGhlIGludGVycHJldGVyLiAuLi4gb2sNClRl c3RpbmcgZm9yIGV4aXN0YW5jZSBvZiBub24tY3J1Y2lhbCBjb25zdGFudHMu IC4uLiBvaw0KVGVzdGluZyByZWZlcmVuY2UgY291bnQgZm9yIGdldG5hbWVp bmZvLiAuLi4gb2sNClRlc3Rpbmcgc2V0c29ja29wdCgpLiAuLi4gb2sNClRl c3RpbmcgZ2V0c29ja25hbWUoKS4gLi4uIG9rDQpUZXN0aW5nIHRoYXQgc29j a2V0IG1vZHVsZSBleGNlcHRpb25zLiAuLi4gb2sNClRlc3RpbmcgZnJvbWZk KCkuIC4uLiBvaw0KVGVzdGluZyByZWNlaXZlIGluIGNodW5rcyBvdmVyIFRD UC4gLi4uIG9rDQpUZXN0aW5nIHJlY3Zmcm9tKCkgaW4gY2h1bmtzIG92ZXIg VENQLiAuLi4gDQpzZWcxPSdNaWNoYWVsIEdpbGZpeCB3YXMgaGUnLCBhZGRy PScoJzEyNy4wLjAuMScsIDM4OTgpJw0Kc2VnMj0ncmUNCicsIGFkZHI9Jygn MTI3LjAuMC4xJywgMzg5OCknDQpvaw0KVGVzdGluZyBsYXJnZSByZWNlaXZl IG92ZXIgVENQLiAuLi4gb2sNClRlc3RpbmcgbGFyZ2UgcmVjdmZyb20oKSBv dmVyIFRDUC4gLi4uIA0KbXNnPSdNaWNoYWVsIEdpbGZpeCB3YXMgaGVyZQ0K JywgYWRkcj0nKCcxMjcuMC4wLjEnLCAzOTAwKScNCm9rDQpUZXN0aW5nIHNl bmRhbGwoKSB3aXRoIGEgMjA0OCBieXRlIHN0cmluZyBvdmVyIFRDUC4gLi4u IG9rDQpUZXN0aW5nIHNodXRkb3duKCkuIC4uLiBvaw0KVGVzdGluZyByZWN2 ZnJvbSgpIG92ZXIgVURQLiAuLi4gb2sNClRlc3Rpbmcgc2VuZHRvKCkgYW5k IFJlY3YoKSBvdmVyIFVEUC4gLi4uIG9rDQpUZXN0aW5nIG5vbi1ibG9ja2lu ZyBhY2NlcHQuIC4uLiBvaw0KVGVzdGluZyBub24tYmxvY2tpbmcgY29ubmVj dC4gLi4uIG9rDQpUZXN0aW5nIG5vbi1ibG9ja2luZyByZWN2LiAuLi4gb2sN ClRlc3Rpbmcgd2hldGhlciBzZXQgYmxvY2tpbmcgd29ya3MuIC4uLiBvaw0K UGVyZm9ybWluZyBmaWxlIHJlYWRsaW5lIHRlc3QuIC4uLiBvaw0KUGVyZm9y bWluZyBzbWFsbCBmaWxlIHJlYWQgdGVzdC4gLi4uIG9rDQpQZXJmb3JtaW5n IHVuYnVmZmVyZWQgZmlsZSByZWFkIHRlc3QuIC4uLiBvaw0KDQotLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tDQpSYW4gMjcgdGVzdHMgaW4gMTAuMzEycw0K DQpPSw0KMSB0ZXN0IE9LLg0KQ0FVVElPTjogIHN0ZG91dCBpc24ndCBjb21w YXJlZCBpbiB2ZXJib3NlIG1vZGU6ICBhIHRlc3QNCnRoYXQgcGFzc2VzIGlu IHZlcmJvc2UgbW9kZSBtYXkgZmFpbCB3aXRob3V0IGl0Lg0KGg== ---888574994-29658-1027085538=:42796 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="test_socket.log.os2emx" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: test_socket.log.os2emx Content-Disposition: attachment; filename="test_socket.log.os2emx" dGVzdF9zb2NrZXQNClRlc3RpbmcgZm9yIG1pc3Npb24gY3JpdGljYWwgY29u c3RhbnRzLiAuLi4gb2sNClRlc3RpbmcgZGVmYXVsdCB0aW1lb3V0LiAuLi4g b2sNClRlc3RpbmcgZ2V0c2VydmJ5bmFtZSgpLiAuLi4gb2sNClRlc3Rpbmcg Z2V0c29ja29wdCgpLiAuLi4gb2sNClRlc3RpbmcgaG9zdG5hbWUgcmVzb2x1 dGlvbiBtZWNoYW5pc21zLiAuLi4gb2sNCk1ha2luZyBzdXJlIGdldG5hbWVp bmZvIGRvZXNuJ3QgY3Jhc2ggdGhlIGludGVycHJldGVyLiAuLi4gb2sNClRl c3RpbmcgZm9yIGV4aXN0YW5jZSBvZiBub24tY3J1Y2lhbCBjb25zdGFudHMu IC4uLiBvaw0KVGVzdGluZyByZWZlcmVuY2UgY291bnQgZm9yIGdldG5hbWVp bmZvLiAuLi4gb2sNClRlc3Rpbmcgc2V0c29ja29wdCgpLiAuLi4gb2sNClRl c3RpbmcgZ2V0c29ja25hbWUoKS4gLi4uIG9rDQpUZXN0aW5nIHRoYXQgc29j a2V0IG1vZHVsZSBleGNlcHRpb25zLiAuLi4gb2sNClRlc3RpbmcgZnJvbWZk KCkuIC4uLiBvaw0KVGVzdGluZyByZWNlaXZlIGluIGNodW5rcyBvdmVyIFRD UC4gLi4uIG9rDQpUZXN0aW5nIHJlY3Zmcm9tKCkgaW4gY2h1bmtzIG92ZXIg VENQLiAuLi4gDQpzZWcxPSdNaWNoYWVsIEdpbGZpeCB3YXMgaGUnLCBhZGRy PScoJzEyNy4wLjAuMScsIDQyNzQpJw0Kc2VnMj0ncmUNCicsIGFkZHI9Jygn MTI3LjAuMC4xJywgNDI3NCknDQpvaw0KVGVzdGluZyBsYXJnZSByZWNlaXZl IG92ZXIgVENQLiAuLi4gb2sNClRlc3RpbmcgbGFyZ2UgcmVjdmZyb20oKSBv dmVyIFRDUC4gLi4uIA0KbXNnPSdNaWNoYWVsIEdpbGZpeCB3YXMgaGVyZQ0K JywgYWRkcj0nKCcxMjcuMC4wLjEnLCA0Mjc2KScNCm9rDQpUZXN0aW5nIHNl bmRhbGwoKSB3aXRoIGEgMjA0OCBieXRlIHN0cmluZyBvdmVyIFRDUC4gLi4u IEZBSUwNClRlc3Rpbmcgc2h1dGRvd24oKS4gLi4uIG9rDQpUZXN0aW5nIHJl Y3Zmcm9tKCkgb3ZlciBVRFAuIC4uLiBvaw0KVGVzdGluZyBzZW5kdG8oKSBh bmQgUmVjdigpIG92ZXIgVURQLiAuLi4gb2sNClRlc3Rpbmcgbm9uLWJsb2Nr aW5nIGFjY2VwdC4gLi4uIG9rDQpUZXN0aW5nIG5vbi1ibG9ja2luZyBjb25u ZWN0LiAuLi4gRVJST1INClRlc3Rpbmcgbm9uLWJsb2NraW5nIHJlY3YuIC4u LiBvaw0KVGVzdGluZyB3aGV0aGVyIHNldCBibG9ja2luZyB3b3Jrcy4gLi4u IG9rDQpQZXJmb3JtaW5nIGZpbGUgcmVhZGxpbmUgdGVzdC4gLi4uIG9rDQpQ ZXJmb3JtaW5nIHNtYWxsIGZpbGUgcmVhZCB0ZXN0LiAuLi4gb2sNClBlcmZv cm1pbmcgdW5idWZmZXJlZCBmaWxlIHJlYWQgdGVzdC4gLi4uIG9rDQoNCj09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT0NCkVSUk9SOiBUZXN0aW5nIG5vbi1i bG9ja2luZyBjb25uZWN0Lg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0K VHJhY2ViYWNrIChtb3N0IHJlY2VudCBjYWxsIGxhc3QpOg0KICBGaWxlICIu Li8uLi9MaWIvdGVzdC90ZXN0X3NvY2tldC5weSIsIGxpbmUgMTE3LCBpbiBf dGVhckRvd24NCiAgICBzZWxmLmZhaWwobXNnKQ0KICBGaWxlICJGOi9ERVYv Q1ZTX1RFU1QvUFlUSE9OLUNWUy9MaWIvdW5pdHRlc3QucHkiLCBsaW5lIDI1 NCwgaW4gZmFpbA0KICAgIHJhaXNlIHNlbGYuZmFpbHVyZUV4Y2VwdGlvbiwg bXNnDQpBc3NlcnRpb25FcnJvcjogKDU2LCAnU29ja2V0IGlzIGFscmVhZHkg Y29ubmVjdGVkJykNCg0KPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQ0KRkFJ TDogVGVzdGluZyBzZW5kYWxsKCkgd2l0aCBhIDIwNDggYnl0ZSBzdHJpbmcg b3ZlciBUQ1AuDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQpUcmFjZWJh Y2sgKG1vc3QgcmVjZW50IGNhbGwgbGFzdCk6DQogIEZpbGUgIi4uLy4uL0xp Yi90ZXN0L3Rlc3Rfc29ja2V0LnB5IiwgbGluZSA0MTIsIGluIHRlc3RTZW5k QWxsDQogICAgc2VsZi5hc3NlcnRfKGxlbihyZWFkKSA9PSAxMDI0LCAiRXJy b3IgcGVyZm9ybWluZyBzZW5kYWxsLiIpDQogIEZpbGUgIkY6L0RFVi9DVlNf VEVTVC9QWVRIT04tQ1ZTL0xpYi91bml0dGVzdC5weSIsIGxpbmUgMjYyLCBp biBmYWlsVW5sZXNzDQogICAgaWYgbm90IGV4cHI6IHJhaXNlIHNlbGYuZmFp bHVyZUV4Y2VwdGlvbiwgbXNnDQpBc3NlcnRpb25FcnJvcjogRXJyb3IgcGVy Zm9ybWluZyBzZW5kYWxsLg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t DQpSYW4gMjcgdGVzdHMgaW4gMTAuMDkwcw0KDQpGQUlMRUQgKGZhaWx1cmVz PTEsIGVycm9ycz0xKQ0KdGVzdCB0ZXN0X3NvY2tldCBmYWlsZWQgLS0gZXJy b3JzIG9jY3VycmVkOyBydW4gaW4gdmVyYm9zZSBtb2RlIGZvciBkZXRhaWxz DQoxIHRlc3QgZmFpbGVkOg0KdGVzdF9zb2NrZXQNCg== ---888574994-29658-1027085538=:42796 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="test_socket.py.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: test_socket.py.diff Content-Disposition: attachment; filename="test_socket.py.diff" KioqIHRlc3Rfc29ja2V0LnB5Lm9yaWcJRnJpIEp1bCAxOSAyMzoxOTowMCAy MDAyDQotLS0gdGVzdF9zb2NrZXQucHkJRnJpIEp1bCAxOSAyMzozMjozNiAy MDAyDQoqKioqKioqKioqKioqKioNCioqKiA4LDEzICoqKioNCi0tLSA4LDE0 IC0tLS0NCiAgaW1wb3J0IHRpbWUNCiAgaW1wb3J0IHRocmVhZCwgdGhyZWFk aW5nDQogIGltcG9ydCBRdWV1ZQ0KKyBpbXBvcnQgdHJhY2ViYWNrDQogIA0K ICBQT1JUID0gNTAwMDcNCiAgSE9TVCA9ICdsb2NhbGhvc3QnDQoqKioqKioq KioqKioqKioNCioqKiAzNzQsMzc5ICoqKioNCi0tLSAzNzUsMzgzIC0tLS0N CiAgICAgIGRlZiB0ZXN0UmVjdkZyb20oc2VsZik6DQogICAgICAgICAgIiIi VGVzdGluZyBsYXJnZSByZWN2ZnJvbSgpIG92ZXIgVENQLiIiIg0KICAgICAg ICAgIG1zZywgYWRkciA9IHNlbGYuY2xpX2Nvbm4ucmVjdmZyb20oMTAyNCkN CisgICAgICAgICBpZiBhZGRyIGlzIE5vbmU6DQorICAgICAgICAgICAgIGFk ZHIgPSBzZWxmLmNsaV9jb25uLmdldHBlZXJuYW1lKCkNCisgICAgICAgICBw cmludCAiXG5tc2c9JyVzJywgYWRkcj0nJXMnIiAlIChtc2csIHJlcHIoYWRk cikpDQogICAgICAgICAgaG9zdG5hbWUsIHBvcnQgPSBhZGRyDQogICAgICAg ICAgIyNzZWxmLmFzc2VydEVxdWFsKGhvc3RuYW1lLCBzb2NrZXQuZ2V0aG9z dGJ5bmFtZSgnbG9jYWxob3N0JykpDQogICAgICAgICAgc2VsZi5hc3NlcnRF cXVhbChtc2csIE1TRykNCioqKioqKioqKioqKioqKg0KKioqIDM4NCwzOTEg KioqKg0KLS0tIDM4OCw0MDEgLS0tLQ0KICAgICAgZGVmIHRlc3RPdmVyRmxv d1JlY3ZGcm9tKHNlbGYpOg0KICAgICAgICAgICIiIlRlc3RpbmcgcmVjdmZy b20oKSBpbiBjaHVua3Mgb3ZlciBUQ1AuIiIiDQogICAgICAgICAgc2VnMSwg YWRkciA9IHNlbGYuY2xpX2Nvbm4ucmVjdmZyb20obGVuKE1TRyktMykNCisg ICAgICAgICBpZiBhZGRyIGlzIE5vbmU6DQorICAgICAgICAgICAgIGFkZHIg PSBzZWxmLmNsaV9jb25uLmdldHBlZXJuYW1lKCkNCisgICAgICAgICBwcmlu dCAiXG5zZWcxPSclcycsIGFkZHI9JyVzJyIgJSAoc2VnMSwgcmVwcihhZGRy KSkNCiAgICAgICAgICBzZWcyLCBhZGRyID0gc2VsZi5jbGlfY29ubi5yZWN2 ZnJvbSgxMDI0KQ0KKyAgICAgICAgIGlmIGFkZHIgaXMgTm9uZToNCisgICAg ICAgICAgICAgYWRkciA9IHNlbGYuY2xpX2Nvbm4uZ2V0cGVlcm5hbWUoKQ0K ICAgICAgICAgIG1zZyA9IHNlZzEgKyBzZWcyDQorICAgICAgICAgcHJpbnQg InNlZzI9JyVzJywgYWRkcj0nJXMnIiAlIChzZWcyLCByZXByKGFkZHIpKQ0K ICAgICAgICAgIGhvc3RuYW1lLCBwb3J0ID0gYWRkcg0KICAgICAgICAgICMj c2VsZi5hc3NlcnRFcXVhbChob3N0bmFtZSwgc29ja2V0LmdldGhvc3RieW5h bWUoJ2xvY2FsaG9zdCcpKQ0KICAgICAgICAgIHNlbGYuYXNzZXJ0RXF1YWwo bXNnLCBNU0cpDQoqKioqKioqKioqKioqKioNCioqKiA0NDQsNDQ5ICoqKioN Ci0tLSA0NTQsNDYxIC0tLS0NCiAgICAgIGRlZiB0ZXN0UmVjdkZyb20oc2Vs Zik6DQogICAgICAgICAgIiIiVGVzdGluZyByZWN2ZnJvbSgpIG92ZXIgVURQ LiIiIg0KICAgICAgICAgIG1zZywgYWRkciA9IHNlbGYuc2Vydi5yZWN2ZnJv bShsZW4oTVNHKSkNCisgICAgICAgICBpZiBhZGRyIGlzIE5vbmU6DQorICAg ICAgICAgICAgIGFkZHIgPSBzZWxmLmNsaV9jb25uLmdldHBlZXJuYW1lKCkN CiAgICAgICAgICBob3N0bmFtZSwgcG9ydCA9IGFkZHINCiAgICAgICAgICAj I3NlbGYuYXNzZXJ0RXF1YWwoaG9zdG5hbWUsIHNvY2tldC5nZXRob3N0Ynlu YW1lKCdsb2NhbGhvc3QnKSkNCiAgICAgICAgICBzZWxmLmFzc2VydEVxdWFs KG1zZywgTVNHKQ0KKioqKioqKioqKioqKioqDQoqKiogNDc4LDQ4MyAqKioq DQotLS0gNDkwLDQ5NiAtLS0tDQogICAgICAgICAgZXhjZXB0IHNvY2tldC5l cnJvcjoNCiAgICAgICAgICAgICAgcGFzcw0KICAgICAgICAgIGVsc2U6DQor ICAgICAgICAgICAgIHByaW50ICJcbmNvbm49IiArIHJlcHIoY29ubikgKyAi XG5hZGRyPSIgKyByZXByKGFkZHIpDQogICAgICAgICAgICAgIHNlbGYuZmFp bCgiRXJyb3IgdHJ5aW5nIHRvIGRvIG5vbi1ibG9ja2luZyBhY2NlcHQuIikN CiAgICAgICAgICByZWFkLCB3cml0ZSwgZXJyID0gc2VsZWN0LnNlbGVjdChb c2VsZi5zZXJ2XSwgW10sIFtdKQ0KICAgICAgICAgIGlmIHNlbGYuc2VydiBp biByZWFkOg0KKioqKioqKioqKioqKioqDQoqKiogNDg2LDQ5MSAqKioqDQot LS0gNDk5LDUwNSAtLS0tDQogICAgICAgICAgICAgIHNlbGYuZmFpbCgiRXJy b3IgdHJ5aW5nIHRvIGRvIGFjY2VwdCBhZnRlciBzZWxlY3QuIikNCiAgDQog ICAgICBkZWYgX3Rlc3RBY2NlcHQoc2VsZik6DQorICAgICAgICAgdGltZS5z bGVlcCg1KQ0KICAgICAgICAgIHNlbGYuY2xpLmNvbm5lY3QoKEhPU1QsIFBP UlQpKQ0KICANCiAgICAgIGRlZiB0ZXN0Q29ubmVjdChzZWxmKToNCioqKioq KioqKioqKioqKg0KKioqIDUwNSw1MTAgKioqKg0KLS0tIDUxOSw1MjUgLS0t LQ0KICAgICAgICAgIGV4Y2VwdCBzb2NrZXQuZXJyb3I6DQogICAgICAgICAg ICAgIHBhc3MNCiAgICAgICAgICBlbHNlOg0KKyAgICAgICAgICAgICBwcmlu dCAiXG5jb25uPSIgKyByZXByKGNvbm4pICsgIlxuYWRkcj0iICsgcmVwcihh ZGRyKQ0KICAgICAgICAgICAgICBzZWxmLmZhaWwoIkVycm9yIHRyeWluZyB0 byBkbyBub24tYmxvY2tpbmcgcmVjdi4iKQ0KICAgICAgICAgIHJlYWQsIHdy aXRlLCBlcnIgPSBzZWxlY3Quc2VsZWN0KFtjb25uXSwgW10sIFtdKQ0KICAg ICAgICAgIGlmIGNvbm4gaW4gcmVhZDoNCioqKioqKioqKioqKioqKg0KKioq IDUxNSw1MjAgKioqKg0KLS0tIDUzMCw1MzYgLS0tLQ0KICANCiAgICAgIGRl ZiBfdGVzdFJlY3Yoc2VsZik6DQogICAgICAgICAgc2VsZi5jbGkuY29ubmVj dCgoSE9TVCwgUE9SVCkpDQorICAgICAgICAgdGltZS5zbGVlcCg1KQ0KICAg ICAgICAgIHNlbGYuY2xpLnNlbmQoTVNHKQ0KICANCiAgY2xhc3MgRmlsZU9i amVjdENsYXNzVGVzdENhc2UoU29ja2V0Q29ubmVjdGVkVGVzdCk6DQoqKioq KioqKioqKioqKioNCioqKiA1NzQsNTgwICoqKioNCiAgICAgICAgICBzZWxm LmNsaV9maWxlLndyaXRlKE1TRykNCiAgICAgICAgICBzZWxmLmNsaV9maWxl LmZsdXNoKCkNCiAgDQohIGRlZiBtYWluKCk6DQogICAgICBzdWl0ZSA9IHVu aXR0ZXN0LlRlc3RTdWl0ZSgpDQogICAgICBzdWl0ZS5hZGRUZXN0KHVuaXR0 ZXN0Lm1ha2VTdWl0ZShHZW5lcmFsTW9kdWxlVGVzdHMpKQ0KICAgICAgc3Vp dGUuYWRkVGVzdCh1bml0dGVzdC5tYWtlU3VpdGUoQmFzaWNUQ1BUZXN0KSkN Ci0tLSA1OTAsNTk2IC0tLS0NCiAgICAgICAgICBzZWxmLmNsaV9maWxlLndy aXRlKE1TRykNCiAgICAgICAgICBzZWxmLmNsaV9maWxlLmZsdXNoKCkNCiAg DQohIGRlZiB0ZXN0X21haW4oKToNCiAgICAgIHN1aXRlID0gdW5pdHRlc3Qu VGVzdFN1aXRlKCkNCiAgICAgIHN1aXRlLmFkZFRlc3QodW5pdHRlc3QubWFr ZVN1aXRlKEdlbmVyYWxNb2R1bGVUZXN0cykpDQogICAgICBzdWl0ZS5hZGRU ZXN0KHVuaXR0ZXN0Lm1ha2VTdWl0ZShCYXNpY1RDUFRlc3QpKQ0KKioqKioq KioqKioqKioqDQoqKiogNTg0LDU4NyAqKioqDQogICAgICB0ZXN0X3N1cHBv cnQucnVuX3N1aXRlKHN1aXRlKQ0KICANCiAgaWYgX19uYW1lX18gPT0gIl9f bWFpbl9fIjoNCiEgICAgIG1haW4oKQ0KLS0tIDYwMCw2MDMgLS0tLQ0KICAg ICAgdGVzdF9zdXBwb3J0LnJ1bl9zdWl0ZShzdWl0ZSkNCiAgDQogIGlmIF9f bmFtZV9fID09ICJfX21haW5fXyI6DQohICAgICB0ZXN0X21haW4oKQ0K ---888574994-29658-1027085538=:42796-- From andymac@bullseye.apana.org.au Fri Jul 19 14:37:12 2002 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Sat, 20 Jul 2002 00:37:12 +1100 (edt) Subject: [Python-Dev] test_socket failure on FreeBSD In-Reply-To: Message-ID: On Sat, 20 Jul 2002, Andrew MacIntyre wrote: {...} > Also attached is the diff I applied to test_socket.py (as of about 1900 > UTC 020719). Oops, that timestamp is still a couple of hours in the future. Should have been 1900 UTC 020718. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From gsw@agere.com Fri Jul 19 20:41:09 2002 From: gsw@agere.com (Gerald S. Williams) Date: Fri, 19 Jul 2002 15:41:09 -0400 Subject: [Python-Dev] The iterator story (Single- vs. Multi-pass iterability?) In-Reply-To: <20020719185602.21423.41415.Mailman@mail.python.org> Message-ID: I started to type this before looking back at the other threads, so feel free to ignore it if it's entirely superfluous. I'm sorry that I didn't have time to follow the "Single- vs. Multi-pass iterability" thread. Code freeze is today. :-) I'm a little confused about this destructive-for/iterator issue. Sure an iterator that destroys the original object might be unexpected, but wouldn't you expect a non-destructive iterator to be the default for any object unless there's a pretty good reason to use a destructive one? If there's a chance that the object may be destroyed/altered (such as a file stream or an iterator), shouldn't you already have some reason to suspect that? -Jerry Strong typing is for weak minds. Weak typing is for the real troublemakers. ;-) P.S. Leaving off the original subject line can be mildly annoying to those of us subscribing to the digest version of the list. Probably more so to those who read our responses. :-) From guido@python.org Fri Jul 19 21:24:04 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 16:24:04 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src configure,1.322,1.323 configure.in,1.333,1.334 pyconfig.h.in,1.43,1.44 In-Reply-To: Your message of "Fri, 19 Jul 2002 16:06:24 EDT." References: Message-ID: <200207192024.g6JKO4c14964@pcp02138704pcs.reston01.va.comcast.net> [Tim, in python-checkins] > I don't understand why this helps. Are you sure it does? Python.h still > contains: > > #ifndef _XOPEN_SOURCE > # define _XOPEN_SOURCE 500 > #endif > > The configure changes were consequences of that change, IIRC. We surely > shouldn't be defining this one way in Python.h and a different way in > config, right? I'm certain that it helps: test_time failed since Jeremy made the change to configure, now it succeeds again. It may not be the right fix, sure, but I recommend that we don't check in a fix that breaks other things. The search is on, and I trust that Jeremy and Martin will figure something out (and that Jeremy will run autoconf, autoheader, configure, *and* the test suite before checking in more changes). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 21:29:29 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 16:29:29 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 04:44:09 PDT." References: Message-ID: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> > It's just not the way i expect for-loops to work. Perhaps we would > need to survey people for objective data, but i feel that most people > would be surprised if > > for x in y: print x > for x in y: print x > > did not print the same thing twice, or if > > if x in y: print 'got it' > if x in y: print 'got it' > > did not do the same thing twice. I realize this is my own opinion, > but it's a fairly strong impression i have. I think it's a naive persuasion that doesn't hold under scrutiny. For a long time people have faked iterators by providing pseudo-sequences that did unspeakable things. In general, I'm pretty sure that if I asked an uninitiated user what "for line in file" would do, if it did anything, they would understand that if you tried that a second time you'd hit EOF right away. > Even if it's okay for for-loops to destroy their arguments, i still > think it sets up a bad situation: we may end up with functions > manipulating sequence-like things all over, but it becomes unclear > whether they destroy their arguments or not. It becomes possible > to write a function which sometimes destroys its argument and sometimes > doesn't. Bugs get deeper and harder to find. This sounds awfully similar to the old argument "functions (as opposed to procedures) should never have side effects". ABC implemented that literally (the environment was saved and restored around function calls, with an exception for the seed for the built-in random generator), with the hope that it would provide fewer surprises. It did the opposite: it drove people crazy because the language was trying to be smarter than them. > I believe this is where the biggest debate lies: whether "for" should be > non-destructive. I realize we are currently on the other side of the > fence, but i foresee enough potential pain that i would like you to > consider the value of keeping "for" loops non-destructive. I don't see any real debate. I only see you chasing windmills. Sorry. For-loops have had the possibility to destroy their arguments since the day __getitem__ was introduced. > > Maybe the for-loop is a red herring? Calling next() on an > > iterator may or may not be destructive on the underlying "sequence" -- > > if it is a generator, for example, I would call it destructive. > > Well, for a generator, there is no underlying sequence. > > while 1: print next(gen) > > makes it clear that there is no sequence, but > > for x in gen: print x > > seems to give me the impression that there is. This seems to be a misrepresentation. The idiom for using any iterator (not just generators) *without* using a for-loop would have to be something like: while 1: try: item = it.next() # or it.__next__() or next(it) except StopIteration: break ...do something with item... (Similar to the traditional idiom for looping over the lines of a file.) The for-loop over an iterator was invented so you could write this as: for item in it: ...do something with item... I'm not giving that up so easily! > > Perhaps you're trying to assign properties to the iterator abstraction > > that aren't really there? > > I'm assigning properties to "for" that you aren't. I think they > are useful properties, though, and worth considering. I'm trying to be open-minded, but I just don't see it. The for loop is more flexible than you seem to want it to be. Alas, it's been like this for years, and I don't think the for-loop needs a face lift. > I don't think i'm assigning properties to the iterator abstraction; > i expect iterators to destroy themselves. But the introduction of > iterators, in the way they are now, breaks this property of "for" > loops that i think used to hold almost all the time in Python, and > that i think holds all the time in almost all other languages. Again, the widespread faking of iterators using destructive __getitem__ methods that were designed to be only used in a for-loop defeats your assertion. > > Next, I'm not sure how renaming next() to __next__() would affect the > > situation w.r.t. the destructivity of for-loops. Or were you talking > > about some other migration? > > The connection is indirect. The renaming is related to: (a) making > __next__() a real, honest-to-goodness protocol independent of __iter__; next() is a real, honest-to-goodness protocol now, and it is independent of __iter__() now. > and (b) getting rid of __iter__ on iterators. It's the presence of > __iter__ on iterators that breaks the non-destructive-for property. So you prefer the while-loop version above over the for-loop version? Gotta be kidding. > I think the renaming of next() to __next__() is a good idea in any > case. It is distant enough from the other issues that it can be done > independently of any decisions about __iter__. Yeah, it's just a pain that it's been deployed in Python 2.2 since last December, and by the time 2.3 is out it will probably have been at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, giving that particular idiom a very long lifetime. I simply don't think we can break compatibility that easily. Remember the endless threads we've had about the pace of change and stability. We have to live with warts, alas. And this is a pretty minor one if you ask me. (I realize that you're proposing another way out in a separate message. I'll reply to that next. Since you changed the subject, I can't wery well reply to it here.) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri Jul 19 21:57:09 2002 From: nas@python.ca (Neil Schemenauer) Date: Fri, 19 Jul 2002 13:57:09 -0700 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207171503.g6HF3mW01047@odiug.zope.com>; from guido@python.org on Wed, Jul 17, 2002 at 11:03:48AM -0400 References: <200207170129.g6H1Tt116117@pcp02138704pcs.reston01.va.comcast.net> <20020717094504.A85351@doublegemini.com> <200207171409.g6HE9Di00659@odiug.zope.com> <20020717104935.A86293@doublegemini.com> <200207171503.g6HF3mW01047@odiug.zope.com> Message-ID: <20020719135709.A22330@glacier.arctrix.com> Guido van Rossum wrote: > - There really isn't anything "broken" about the current situation; > it's just that "next" is the only method name mapped to a slot in > the type object that doesn't have leading and trailing double > underscores. Are you saying the _only_ reason to rename it is for consistency with the other type slot method names? That's really weak, IMHO, and not worth any kind of backwards incompatibility (which seems unavoidable). Neil From paul@svensson.org Fri Jul 19 21:49:54 2002 From: paul@svensson.org (Paul Svensson) Date: Fri, 19 Jul 2002 16:49:54 -0400 (EDT) Subject: [Python-Dev] The iterator story In-Reply-To: <20020719120043.A21503@glacier.arctrix.com> Message-ID: On Fri, 19 Jul 2002, Neil Schemenauer wrote: >__iter__ is not a flag. When you want to loop over an object you call >__iter__ to get an iterator. Since you should be able to loop over all >iterators they should provide a __iter__ that returns self. But you don't really loop _over_ the iterator, you loop _thru_ it. To me there's a fundamental difference between providing a new object and providing a reference to an existing object. This difference is mostly noticable for objects containing state. The raison d'etre for iterators is to contain state. If it's sensible to sometimes return an old object and sometimes a new, then we could have 'list(x) is x' being true when x is already a list. What I'm trying to get to is, __iter__(x) returning an existing object (self in this case) is really something very much different from __iter__() creating new state, and returning that. The problem is that we do want a way to loop _thru_ an iterator, and having __iter__() return self gives us that, at the cost of the above mentioned confusing conflagration. Ping's suggested seq() function solves that quite nicely: class seq: def __init__(self, i): self._iter = i def __iter__(self): return self._iter /Paul From paul-python@svensson.org Fri Jul 19 21:52:42 2002 From: paul-python@svensson.org (Paul Svensson) Date: Fri, 19 Jul 2002 16:52:42 -0400 (EDT) Subject: [Python-Dev] The iterator story Message-ID: On Fri, 19 Jul 2002, Neil Schemenauer wrote: >__iter__ is not a flag. When you want to loop over an object you call >__iter__ to get an iterator. Since you should be able to loop over all >iterators they should provide a __iter__ that returns self. But you don't really loop _over_ the iterator, you loop _thru_ it. To me there's a fundamental difference between providing a new object and providing a reference to an existing object. This difference is mostly noticable for objects containing state. The raison d'etre for iterators is to contain state. If it's sensible to sometimes return an old object and sometimes a new, then we could as well have 'list(x) is x' being true when x is already a list. What I'm trying to get to is, __iter__(x) returning an existing object (self in this case) is really something very much different from __iter__() creating new state, and returning that. The problem is that we do want a way to loop _thru_ an iterator, and having __iter__() return self gives us that, at the cost of the above mentioned confusing conflagration. Ping's suggested seq() function solves that quite nicely: class seq: def __init__(self, i): self._iter = i def __iter__(self): return self._iter /Paul From Jack.Jansen@oratrix.com Fri Jul 19 21:58:57 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 19 Jul 2002 22:58:57 +0200 Subject: [Python-Dev] Added platform-specific directories to sys.path Message-ID: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> I've a question that I'd like some feedback on. On MacOSX there's a set of directories that are meant especially for storing extensions to applications, and there's requests on the pythonmac-sig that I add these directories to the Python search path. This could easily be done optionally, with a .pth file in site-python. MacOSX has rationalized where preferences, libraries, licenses, extensions, etc are stored, and for all of these there's a hierarchy of folders. In the case of Python extension modules the logical places would be ~/Library/Application Support/Python (for user-installed extension modules), /Library/Application Support/Python (for machine-wide installed extension modules) and /Network/Library/Application Support/Python (for workgroup-wide installed modules). The final location, in /System, is for factory-installed stuff from Apple, not needed just yet for this example:-). I sympathize with the idea of making things more conform to the platform standard, on the other hand I'm a bit reluctant to do things differently again from what other Pythons do. But, one of the things that is sorely missing from Python is a standard place to install per-user extension modules, so this might well be the thing that triggers inclusion of such functionality into the grand scheme of things (including distutils support, etc). -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From guido@python.org Fri Jul 19 22:10:45 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 17:10:45 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: Your message of "Fri, 19 Jul 2002 04:28:32 PDT." References: Message-ID: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> > Here is a summary of the whole iterator picture as i currently see it. > This is necessarily subjective, but i will try to be precise so that > it's clear where i'm making a value judgement and where i'm trying to > state fact, and so we can pinpoint areas where we agree and disagree. > > In the subjective sections, i have marked with [@] the places where > i solicit agreement or disagreement. > > I would like to know your opinions on the issues listed below, > and on the places marked [@]. > > > Definitions (objective) > ----------------------- > > Container: a thing that provides non-destructive access to a varying > number of other things. > > Why "non-destructive"? Because i don't expect that merely looking > at the contents will cause a container to be altered. For example, > i expect to be able to look inside a container, see that there are > five elements; leave it alone for a while, come back to it later > and observe once again that there are five elements. > > Consequently, a file object is not a container in general. Given > a file object, you cannot look at it to see if it contains an "A", > and then later look at it once again to see if it contains an "A" > and get the same result. If you could seek, then you could do > this, but not all files support seeking. Even if you could seek, > the act of reading the file would still alter the file object. > > The file object provides no way of getting at the contents without > mutating itself. According to my definition, it's fine for a > container to have ways of mutating itself; but there has to be > *some* way of getting the contents without mutating the container, > or it just ain't a container to me. > > A file object is better described as a stream. Hypothetically > one could create an interface to seekable files that offered some > non-mutating read operations; this would cause the file to look > more like an array of bytes, and i would find it appropriate to > call that interface a container. > > Iterator: a thing that you can poke (i.e. send a no-argument message), > where each time you poke it, it either yields something or announces > that it is exhausted. > > For an iterator to mutate itself every time you poke it is not > part of my definition. But the only non-mutating iterator would > be an iterator that returns the same thing forever, or an iterator > that is always exhausted. So most iterators usually mutate. > > Some iterators are associated with a container, but not all. > > There can be many kinds of iterators associated with a container. > The most natural kind is one that yields the elements of the > container, one by one, mutating itself each time it is poked, > until it has yielded all of the elements of the container and > announces exhaustion. > > A Container's Natural Iterator: an iterator that yields the elements > of the container, one by one, in the order that makes the most sense > for the container. If the container has a finite size n, then the > iterator can be poked exactly n times, and thereafter it is exhausted. Sure. But I note that there are hybrids, and I think files (at least seekable files) fall in the hybrid category. Other examples of hybrids: - Some dbm variants (e.g. dbhash and gdbm) provide first() and next() or firstkey() and nextkey() methods that combine iterator state with the container object. These objects simply provide two different interfaces, a containerish interface (__getitem__ in fact), and an iteratorish interface. - Before we invented the concept of interators, I believe it was common for tree data structures to provide iterators that didn't put the iteration state in a separate object, but simply kept a pointer to the current node of the iteration pass somewhere in the root of the tree. The idea that a container also has some iterator state, and that you have to do something simple (like calling firstkey() or seek(0)) to reset the iterator, is quite common. You may argue that this is poor design that should be fixed, and in general I would agree (the firstkey()/nextkey() protocol in particular is clumsy to use), but it is common nevertheless, and sometimes common usage patterns as well as the relative cost of random access make it a cood compromise sometimes. For example, while a tape file is a container in the sense that reading the data doesn't destroy it, it's very heavily geared towards sequential access, and you can't realistically have two iterators going over the same tape at once. If you're too young to remember, think of files on CD media -- there, random access, while possible, is several orders of magnitude slower than sequential access (better than tape, but a lot worse than regular magnetic hard drives). > Issues (objective) > ------------------ > > I alluded to a set of issues in an earlier message, and i'll begin > there, by defining what i meant more precisely. > > The Destructive-For Issue: > > In most languages i can think of, and in Python for the most > part, a statement such as "for x in y: print x" is a > non-destructive operation on y. Repeating "for x in y: print x" > will produce exactly the same results once more. > > For pre-iterator versions of Python, this fails to be true only > if y's __getitem__ method mutates y. The introduction of > iterators has caused this to now be untrue when y is any iterator. > > The issue is, should "for" be non-destructive? I don't see the benefit. We've done this for years and the only conceptual problem was the abuse of __getitem__, not the destructiveness of the for-loop. > The Destructive-In Issue: > > Notice that the iteration that takes place for the "in" operator > is implemented in the same way as "for". So if "for" destroys > its second operand, so will "in". > > The issue is, should "in" be non-destructive? If it can't be helped otherwise, sure, why not? > (Similar issues exist for built-ins that iterate, like list().) At least list() keeps a copy of all the items, so you can then iterate over them as often as you want. :-) > The __iter__-On-Iterators Issue: > > Some people have mentioned that the presence of an __iter__() > method is a way of signifying that an object supports the > iterator protocol. It has been said that this is necessary > because the presence of a "next()" method is not sufficiently > distinguishing. Not me. > Some have said that __iter__() is a completely distinct protocol > from the iterator protocol. > > The issue is, what is __iter__() really for? To support iter() and for-loops. > And secondarily, if it is not part of the iterator protocol, > then should we require __iter__() on iterators, and why? So that you can use an iterator in a for-loop. > The __next__-Naming Issue: > > The iteration method is currently called "next()". > > Previous candidates for the name of this method were "next", > "__next__", and "__call__". After some previous debate, > it was pronounced to be "next()". > > There are concerns that "next()" might collide with existing > methods named "next()". There is also a concern that "next()" > is inconsistent because it is the only type-slot-method that > does not have a __special__ name. > > The issue is, should it be called "next" or "__next__"? That's a separate issue, and cleans up only a small wart that in practice hasn't hurt anybody AFAIK. > My Positions (subjective) > ------------------------- > > I believe that "for" and "in" and list() should be non-destructive. > I believe that __iter__() should not be required on iterators. > I believe that __next__() is a better name than next(). > > Destructive-For, Destructive-In: > > I think "for" should be non-destructive because that's the way > it has almost always behaved, and that's the way it behaves in > any other language [@] i can think of. > > For a container's __getitem__ method to mutate the container is, > in my opinion, bad behaviour. In pre-iterator Python, we needed > some way to allow the convenience of "for" on user-implemented > containers. So "for" supported a special protocol where it would > call __getitem__ with increasing integers starting from 0 until > it hit an IndexError. This protocol works great for sequence-like > containers that were indexable by integers. > > But other containers had to be hacked somewhat to make them fit. > For example, there was no good way to do "for" over a dictionary-like > container. If you attempted "for" over a user-implemented dictionary, > you got a really weird "KeyError: 0", which only made sense if you > understood that the "for" loop was attempting __getitem__(0). > > (Hey! I just noticed that > > from UserDict import UserDict > for k in UserDict(): print k > > still produces "KeyError: 0"! This oughta be fixed...) Check the CVS logs. At one point before 2.2 was released, UserDict has a __iter__ method. But then SF bug 448153 was filed, presenting evidence that this broke previously working code. So a separate class, IterableUserDict, was added that has the __iter__ method. I agree that this is less than ideal, but that's life. > If you wanted to support "for" on something else, sometimes you > would have to make __getitem__ mutate the object, like it does > in the fileinput module. But then the user has to know that > this object is a special case: "for" only works the first time. This was and still is widespread. There are a lot of objects that have a way to return an iterators (old style using fake __getitem__, and new ones using __iter__ and next) that are intended to be looped over, once. I have no desire to deprecate this behavior, since (a) it would be a major upheaval for the user community (a lot worse than integer division), and (b) I don't see that "fixing" this prevents a particular category of programming errors. > When iterators were introduced, i believed they were supposed > to solve this problem. Currently, they don't. No, they solve the conceptual ugliness of providing a __getitem__ that can only be called once. The new rule is, if you provide __getitem__, it must support random access; otherwise, you should provide __iter__. > Currently, "in" can even be destructive. This is more serious. > While one could argue that it's not so strange for > > for x in y: ... > > to alter y (even though i do think it is strange), i believe > just about anyone would find it very counterintuitive for > > if x in y: > > to alter y. [@] That falls in the category of "then don't do that". > __iter__-On-Iterators: > > I believe __iter__ is not a type flag. As i argued previously, > i think that looking for the presence of methods that don't actually > implement a protocol is a poor way to check for protocol support. > And as things stand, the presence of __iter__ doesn't even work [@] > as a type flag. And I never said it was a type flag. I'm tired of repeating myself, but you keep repeating this broken argument, so I have to keep correcting you. > There are objects with __iter__ that are not iterators (like most > containers). And there are objects without __iter__ that work as > iterators. I know you can legislate the latter away, but i think > such legislation would amount to fighting the programmers -- and > it is infeasible [@] to enforce the presence of __iter__ in practice. I think having next without having __iter__ is like having __getitem__ without having __len__. There are corner cases where you might get away with this because you know it won't be called, but (as I've repeated umpteen times now), a for-loop over an iterator is a common idiom. > Based on Guido's positive response, in which he asked me to make > an addition to the PEP, i believe Guido agrees with me that > __iter__ is distinct from the protocol of an iterator. This > surprised me because it runs counter to the philosophy previously > expressed in the PEP. I recognize that they are separate protocols. But because I like the for-loop as a convenient way to get all of the elements of an iterator, I want iterators to support __iter__. The alternative would be for iter() to see if the object implements next (after finding that it has neither __iter__ nor __getitem__), and return the object itself unchanged. If we had picked __next__ instead of 'next', that would perhaps been my choice (though I might *still* have recommended implementing __iter__ returning self, to avoid two failing getattr calls). > Now suppose we agree that __iter__ and next are distinct protocols. > Then why require iterators to support both? The only reason we > would want __iter__ on iterators is so that we can use "for" [@] > with an iterator as the second operand. Right. Finally you got it. > I have just argued, above, that it's *not* a good idea for "for" > and "in" to be destructive. Since most iterators self-mutate, > it follows that it's not advisable to use an iterator directly > as the second operand of a "for" or "in". > > I realize this seems radical! This may be the most controversial > point i have made. But if you accept that "in" should not > destroy its second argument, the conclusion is unavoidable. Since I have little sympathy for your premise, this conclusion is all from unavoidable for me. :-) > __next__-Naming: > > I think the potential for collision, though small, is significant, > and this makes "__next__" a better choice than "next". A built-in > function next() should be introduced; this function would call the > tp_iternext slot, and for instance objects tp_iternext would call > the __next__ method implemented in Python. > > The connection between this issue and the __iter__ issue is that, > if next() were renamed to __next__(), the argument that __iter__ > is needed as a flag would also go away. I really wish we had had this insight 18 months ago. Right now, it's too late. Dragging all the other stuff in doesn't strengthen the argument for fixing it now. > The Current PEP (objective) > --------------------------- > > The current PEP takes the position that "for" and "in" can be > destructive; that __iter__() and next() represent two distinct > protocols, yet iterators are required to support both; and that > the name of the method on iterators is called "next()". > > > My Ideal Protocol (subjective) > ------------------------------ > > So by now the biggest question/objection you probably have is > "if i can't use an iterator with 'for', then how can i use it?" > > The answer is that "for" is a great way to iterate over things; > it's just that it iterates over containers and i want to preserve > that. We need a different way to iterate over iterators. > > In my ideal world, we would allow a new form of "for", such as > > for line from file: > print line > > The use if "from" instead of "in" would imply that we were > (destructively) pulling things out of the iterator, and would > remove any possible parallel to the test "x in y", which should > rightly remain non-destructive. Alternative syntaxes for for-loops have been proposed as solutions to all sorts of things (e.g. what's called enumerate() in 2.3, and a simplified syntax for range(), and probably other things). I'm not keen on this. I don't want to user-test it, but I expect that it's too subtle a difference, and that we would see Aha! experiences of the kind "Oh, it's a for-*from* loop! I never noticed that, I always read it as a for-*in* loop! That explains the broken behavior." > Here's the whole deal: > > - Iterators provide just one method, __next__(). > > - The built-in next() calls tp_iternext. For instances, > tp_iternext calls __next__. > > - Objects wanting to be iterated over provide just one method, > __iter__(). Some of these are containers, but not all. > > - The built-in iter(foo) calls tp_iter. For instances, > tp_iter calls __iter__. > > - "for x in y" gets iter(y) and uses it as an iterator. > > - "for x from y" just uses y as the iterator. > > That's it. > > Benefits: > > - We have a nice clean division between containers and iterators. > > - When you see "for x in y" you know that y is a container. > > - When you see "for x from y" you know that y is an iterator. > > - "for x in y" never destroys y. > > - "if x in y" never destroys y. > > - If you have an object that is container-like, you can add > an __iter__ method that gives its natural iterator. If > you want, you can supply more iterators that do different > things; no problem. No one using your object is confused > about whether it mutates. > > - If you have an object that is cursor-like or stream-like, > you can safely make it into an iterator by adding __next__. > No one using your object is confused about whether it mutates. > > Other notes: > > - Iterator algebra still works fine, and is still easy to write: > > def alternate(it): > while 1: > yield next(it) > next(it) > > - The file problem has a consistent solution. Instead of writing > "for line in file" you write > > for line from file: > print line > > Being forced to write "from" signals to you that the file is > eaten up. There is no expectation that "for line from file" > will work again. > > The best would be a convenience function "readlines", to > make this even clearer: > > for line in readlines("foo.txt"): > print line > > Now you can do this as many times as you want, and there is > no possibility of confusion; there is no file object on which > to call methods that might mess up the reading of lines. > > > My Not-So-Ideal Protocol > ------------------------ > > All right. So new syntax may be hard to swallow. An alternative > is to introduce an adapter that turns an iterator into something > that "for" will accept -- that is, the opposite of iter(). > > - The built-in seq(it) returns x such that iter(x) yields it. > > Then instead of writing > > for x from it: > > you would write > > for x in seq(it): > > and the rest would be the same. The use of "seq" here is what > would flag the fact that "it" will be destroyed. I don't feel I have to drive it home any further, so I'll leave these last few paragraphs without comments. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 22:20:35 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 17:20:35 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 13:57:09 PDT." <20020719135709.A22330@glacier.arctrix.com> References: <200207170129.g6H1Tt116117@pcp02138704pcs.reston01.va.comcast.net> <20020717094504.A85351@doublegemini.com> <200207171409.g6HE9Di00659@odiug.zope.com> <20020717104935.A86293@doublegemini.com> <200207171503.g6HF3mW01047@odiug.zope.com> <20020719135709.A22330@glacier.arctrix.com> Message-ID: <200207192120.g6JLKZw15241@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum wrote: > > - There really isn't anything "broken" about the current situation; > > it's just that "next" is the only method name mapped to a slot in > > the type object that doesn't have leading and trailing double > > underscores. > > Are you saying the _only_ reason to rename it is for consistency with > the other type slot method names? That's really weak, IMHO, and not > worth any kind of backwards incompatibility (which seems unavoidable). > > Neil Almost. This means that we're retroactively saying that all objects with a next method are iterators, thereby slightly stomping on the user's namespace. But as long a you don't use such an object as an iterator, it's harmless. And if my position wasn't clear already, I agree it's not worth "fixing". :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Jul 19 22:23:07 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 17:23:07 -0400 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: Your message of "Fri, 19 Jul 2002 22:58:57 +0200." <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> References: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> Message-ID: <200207192123.g6JLN7s15263@pcp02138704pcs.reston01.va.comcast.net> > I've a question that I'd like some feedback on. On MacOSX > there's a set of directories that are meant especially for > storing extensions to applications, and there's requests on the > pythonmac-sig that I add these directories to the Python search > path. This could easily be done optionally, with a .pth file in > site-python. > > MacOSX has rationalized where preferences, libraries, licenses, > extensions, etc are stored, and for all of these there's a > hierarchy of folders. In the case of Python extension modules > the logical places would be ~/Library/Application Support/Python > (for user-installed extension modules), /Library/Application > Support/Python (for machine-wide installed extension modules) > and /Network/Library/Application Support/Python (for > workgroup-wide installed modules). The final location, in > /System, is for factory-installed stuff from Apple, not needed > just yet for this example:-). > > I sympathize with the idea of making things more conform to the > platform standard, on the other hand I'm a bit reluctant to do > things differently again from what other Pythons do. But, one of > the things that is sorely missing from Python is a standard > place to install per-user extension modules, so this might well > be the thing that triggers inclusion of such functionality into > the grand scheme of things (including distutils support, etc). Traditionally, on Unix per-user extensions are done by pointing PYTHONPATH to your per-user directory (-ies) in your .profile. On Windows you can do this too, but I bet most people just have a per-user computer. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Fri Jul 19 22:34:40 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 19 Jul 2002 23:34:40 +0200 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <20020719112602.A17763@ActiveState.com> Message-ID: <554A408C-9B5F-11D6-9B6B-003065517236@oratrix.com> On vrijdag, juli 19, 2002, at 08:26 , Trent Mick wrote: > [Tim Peters wrote] >> Excellent! One down, about two hundred thousand to go. > > Mark rocks! Oh, it's MarkH appreciation that's wanted! In that case I'll gladly chime in, I was was afraid it was __declspec(dllexport) appreciation. Mark is one cool dude who knows where his towel is! 199998 to go. Should we start taking a poll who'll be the next python-devver we start appreciating when the counter hits zero? -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From barry@zope.com Fri Jul 19 22:46:30 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 17:46:30 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? References: <20020719112602.A17763@ActiveState.com> <554A408C-9B5F-11D6-9B6B-003065517236@oratrix.com> Message-ID: <15672.34998.636509.747342@anthem.wooz.org> >>>>> "JJ" == Jack Jansen writes: JJ> Oh, it's MarkH appreciation that's wanted! In that case I'll JJ> gladly chime in, I was was afraid it was __declspec(dllexport) JJ> appreciation. Mark is one cool dude who knows where his towel JJ> is! JJ> 199998 to go. Should we start taking a poll who'll be the next JJ> python-devver we start appreciating when the counter hits JJ> zero? My everlasting appreciation of MarkH was cemented the night, many IPCs ago, that he drank me under the table and called us "purple". 199997-to-go-ly y'rs, -Barry From barry@zope.com Fri Jul 19 22:48:53 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 17:48:53 -0400 Subject: [Python-Dev] Added platform-specific directories to sys.path References: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> <200207192123.g6JLN7s15263@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15672.35141.803094.488541@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Traditionally, on Unix per-user extensions are done by GvR> pointing PYTHONPATH to your per-user directory (-ies) in your GvR> .profile. Or adding them to sys.path via your $PYTHONSTARTUP file. OTOH, it might be nice if the distutils `install' command had some switches to make installing in some of these common alternative locations a little easier. That might dovetail nicely if/when we decide to add a site-updates directory to sys.path. -Barry From tommy@ilm.com Fri Jul 19 23:11:07 2002 From: tommy@ilm.com (Hambozo) Date: Fri, 19 Jul 2002 15:11:07 -0700 (PDT) Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <15672.34998.636509.747342@anthem.wooz.org> References: <20020719112602.A17763@ActiveState.com> <554A408C-9B5F-11D6-9B6B-003065517236@oratrix.com> <15672.34998.636509.747342@anthem.wooz.org> Message-ID: <15672.36408.362000.540999@mace.lucasdigital.com> Barry A. Warsaw writes: | | My everlasting appreciation of MarkH was cemented the night, many IPCs | ago, that he drank me under the table and called us "purple". When anyone asks my opinion of Mark I always say: "F**kin' Ripper!" :) 199996 and counting... -Tommy From barry@zope.com Fri Jul 19 23:10:59 2002 From: barry@zope.com (Barry A. Warsaw) Date: Fri, 19 Jul 2002 18:10:59 -0400 Subject: [Python-Dev] Do we still need Lib/test/data? Message-ID: <15672.36467.645262.622848@anthem.wooz.org> I'm about to check in some changes to the email package, which will include a re-organization of its test suite. Part of this will be so that I can add some huge torture tests to the standalone mimelib project without committing megs of email samples to the Python project. It will also makes it easier for me to create the mimelib distro because I'll then be able to put the setup.py file in the email directory instead of having to maintain a fake hierarchy elsewhere just to make distutils happy. Specifically, I'm going to move the bulk of Lib/test_email.py and Lib/test_email_codes.py to Lib/email/test and make email.test a full-fledged subpackage of the email package. I'm also going to move the Lib/test/data directory to Lib/email/test. I'll do this by creating a new directory and cvs adding a copy of the files to the new location (the cvs revision history isn't important enough to preserve). I believe this should be entirely transparent to most of you. My question is whether I should cvsrm the files that are currently in Lib/test/data or not? On the one hand, I don't want to maintain duplicates, but OTOH, I'm not sure if any other code or tests depends on those files (I did some attempts at grepping for this and didn't /see/ anything but I'm trying to be conservative). Needless to say I won't be actually removing the Lib/test/data directory, but a "cvs up -P" would hide it from you. Any opinions? -Barry From neal@metaslash.com Fri Jul 19 23:32:09 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Jul 2002 18:32:09 -0400 Subject: [Python-Dev] The iterator story References: <20020719120043.A21503@glacier.arctrix.com> Message-ID: <3D389369.948547E0@metaslash.com> Neil Schemenauer wrote: > > Ka-Ping Yee wrote: > > I think "for" should be non-destructive because that's the way > > it has almost always behaved, and that's the way it behaves in > > any other language [@] i can think of. > > I agree that it can be surprising to have "for" destory the object it's > looping over. I myself was bitten once by it. I'm not yet sure if this > is something that will repeatedly bite. I suspect it might. :-( In what context? Were you iterating over a file or something else? I'm wondering if this is a problem, perhaps pychecker could generate a warning? Neal From aahz@pythoncraft.com Fri Jul 19 23:29:38 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 19 Jul 2002 18:29:38 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> References: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020719222938.GA23413@panix.com> On Fri, Jul 19, 2002, Guido van Rossum wrote: >Ping: >> >> I think the renaming of next() to __next__() is a good idea in any >> case. It is distant enough from the other issues that it can be done >> independently of any decisions about __iter__. > > Yeah, it's just a pain that it's been deployed in Python 2.2 since > last December, and by the time 2.3 is out it will probably have been > at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, > giving that particular idiom a very long lifetime. I simply don't > think we can break compatibility that easily. Remember the endless > threads we've had about the pace of change and stability. We have to > live with warts, alas. And this is a pretty minor one if you ask me. Is this a Pronouncement, or are we still waiting on the results of the survey? Note that several people have suggested a multi-release strategy for fixing this problem; does that make any difference? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From neal@metaslash.com Fri Jul 19 23:47:38 2002 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 19 Jul 2002 18:47:38 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability References: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> <20020719222938.GA23413@panix.com> Message-ID: <3D38970A.2693833E@metaslash.com> Aahz wrote: > > On Fri, Jul 19, 2002, Guido van Rossum wrote: > >Ping: > >> > >> I think the renaming of next() to __next__() is a good idea in any > >> case. It is distant enough from the other issues that it can be done > >> independently of any decisions about __iter__. > > > > Yeah, it's just a pain that it's been deployed in Python 2.2 since > > last December, and by the time 2.3 is out it will probably have been > > at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, > > giving that particular idiom a very long lifetime. I simply don't > > think we can break compatibility that easily. Remember the endless > > threads we've had about the pace of change and stability. We have to > > live with warts, alas. And this is a pretty minor one if you ask me. > > Is this a Pronouncement, or are we still waiting on the results of the > survey? Note that several people have suggested a multi-release > strategy for fixing this problem; does that make any difference? Would it be good to use __next__() if it exists, else try next()? This doesn't fix the current 'wart,' however, it could allow moving closer to the desired end. It could cause confusion. For compatability, one would only need to do: next = __next__ or vica versa. Not sure this is worth it. But if there is a transition, it could ease the pain. Neal From nas@python.ca Sat Jul 20 00:22:26 2002 From: nas@python.ca (Neil Schemenauer) Date: Fri, 19 Jul 2002 16:22:26 -0700 Subject: [Python-Dev] The iterator story In-Reply-To: <3D389369.948547E0@metaslash.com>; from neal@metaslash.com on Fri, Jul 19, 2002 at 06:32:09PM -0400 References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> Message-ID: <20020719162226.A22929@glacier.arctrix.com> Neal Norwitz wrote: > In what context? Were you iterating over a file or something else? > I'm wondering if this is a problem, perhaps pychecker could generate > a warning? I was switching between implementing something as a generator and returning a list. I was curious why I was getting different behavior until I realized I was iterating over the result twice. I don't think pychecker could warn about such a bug. Neil From martin@v.loewis.de Sat Jul 20 01:02:11 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 20 Jul 2002 02:02:11 +0200 Subject: [Python-Dev] Where's time.daylight??? In-Reply-To: <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > I'm going to remove the _XOPEN_SOURCE define; Jeremy and Martin can > try to figure out what the right thing is for Tru64. This is the wrong solution; instead, you need to define _GNU_SOURCE in addition to _XOPEN_SOURCE. Regards, Martin From martin@v.loewis.de Sat Jul 20 01:06:51 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 20 Jul 2002 02:06:51 +0200 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> References: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> Message-ID: Jack Jansen writes: > I sympathize with the idea of making things more conform to the > platform standard, on the other hand I'm a bit reluctant to do things > differently again from what other Pythons do. But, one of the things > that is sorely missing from Python is a standard place to install > per-user extension modules, so this might well be the thing that > triggers inclusion of such functionality into the grand scheme of > things (including distutils support, etc). If that is the platform convention, I see no problem following it. Windows already does things differently from Unix, by using the registry to compute sys.path. Regards, Martin From guido@python.org Sat Jul 20 01:30:04 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 20:30:04 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 18:29:38 EDT." <20020719222938.GA23413@panix.com> References: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> <20020719222938.GA23413@panix.com> Message-ID: <200207200030.g6K0U4P26218@pcp02138704pcs.reston01.va.comcast.net> > > Yeah, it's just a pain that it's been deployed in Python 2.2 since > > last December, and by the time 2.3 is out it will probably have been > > at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, > > giving that particular idiom a very long lifetime. I simply don't > > think we can break compatibility that easily. Remember the endless > > threads we've had about the pace of change and stability. We have to > > live with warts, alas. And this is a pretty minor one if you ask me. > > Is this a Pronouncement, or are we still waiting on the results of the > survey? That is my current opinion. I'm waiting for the results of the survey to see if I'll be swayed (but I don't think it's likely). > Note that several people have suggested a multi-release > strategy for fixing this problem; does that make any difference? Such a big gun for such a minor problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 20 01:41:18 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 20:41:18 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Your message of "Fri, 19 Jul 2002 18:47:38 EDT." <3D38970A.2693833E@metaslash.com> References: <200207192029.g6JKTU015005@pcp02138704pcs.reston01.va.comcast.net> <20020719222938.GA23413@panix.com> <3D38970A.2693833E@metaslash.com> Message-ID: <200207200041.g6K0fIX26940@pcp02138704pcs.reston01.va.comcast.net> > Would it be good to use __next__() if it exists, else try next()? Then the code in typeobject.c (e.g. resolve_slotdups) would have to map tp_iternext to *both* __next__ and next. > This doesn't fix the current 'wart,' however, it could allow > moving closer to the desired end. It could cause confusion. > For compatability, one would only need to do: > > next = __next__ > > or vica versa. > > Not sure this is worth it. But if there is a transition, it could > ease the pain. I don't think it's worth it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 20 01:43:21 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 19 Jul 2002 20:43:21 -0400 Subject: [Python-Dev] Where's time.daylight??? In-Reply-To: Your message of "Sat, 20 Jul 2002 02:02:11 +0200." References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207200043.g6K0hMJ27043@pcp02138704pcs.reston01.va.comcast.net> > > I'm going to remove the _XOPEN_SOURCE define; Jeremy and Martin can > > try to figure out what the right thing is for Tru64. > > This is the wrong solution; instead, you need to define _GNU_SOURCE in > addition to _XOPEN_SOURCE. Can you check that in? I'm about to disappear to OSCON for a week. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 20 07:06:29 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 20 Jul 2002 02:06:29 -0400 Subject: [Python-Dev] Priority queue (binary heap) python code In-Reply-To: Your message of "Mon, 24 Jun 2002 21:33:18 EDT." <20020624213318.A5740@arizona.localdomain> References: <20020624213318.A5740@arizona.localdomain> Message-ID: <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net> > Any chance something like this could make it into the standard python > library? It would save a lot of time for lazy people like myself. :-) > > def heappush(heap, item): > pos = len(heap) > heap.append(None) > while pos: > parentpos = (pos - 1) / 2 > parent = heap[parentpos] > if item <= parent: > break > heap[pos] = parent > pos = parentpos > heap[pos] = item > > def heappop(heap): > endpos = len(heap) - 1 > if endpos <= 0: > return heap.pop() > returnitem = heap[0] > item = heap.pop() > pos = 0 > while 1: > child2pos = (pos + 1) * 2 > child1pos = child2pos - 1 > if child2pos < endpos: > child1 = heap[child1pos] > child2 = heap[child2pos] > if item >= child1 and item >= child2: > break > if child1 > child2: > heap[pos] = child1 > pos = child1pos > continue > heap[pos] = child2 > pos = child2pos > continue > if child1pos < endpos: > child1 = heap[child1pos] > if child1 > item: > heap[pos] = child1 > pos = child1pos > break > heap[pos] = item > return returnitem I have read (or at least skimmed) this entire thread now. After I reconstructed the algorithm in my head, I went back to Kevin's code; I admire the compactness of his code. I believe that this would make a good addition to the standard library, as a friend of the bisect module. The only change I would make would be to make heap[0] the lowest value rather than the highest. (That's one thing that I liked better about François Pinard's version, but a class seems too heavy for this, just like it is overkill for bisect [*]. Oh, and maybe we can borrow a few lines of François's description of the algorithm. :-) I propose to call it heapq.py. (Got a better name? Now or never.) [*] Afterthought: this could be made into an new-style class by adding something like this to the end of module: class heapq(list): __slots__ = [] heappush = heappush heappop = heappop A similar addition could easily be made to the bisect module. But this is very different from François' class, which hides the other list methods. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sat Jul 20 07:18:16 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 20 Jul 2002 02:18:16 -0400 Subject: [Python-Dev] Sorting Message-ID: An enormous amount of research has been done on sorting since the last time I wrote a sort for Python. Major developments have been in two areas: 1. Adaptive sorting. Sorting algorithms are usually tested on random data, but in real life you almost never see random data. Python's sort tries to catch some common cases of near-order via special- casing. The literature has since defined more than 15 formal measures of disorder, and developed algorithms provably optimal in the face of one or more of them. But this is O() optimality, and theoreticians aren't much concerned about how big the constant factor is. Some researchers are up front about this, and toward the end of one paper with "practical" in its title, the author was overjoyed to report that an implementation was only twice as slow as a naive quicksort . 2. Pushing the worst-case number of comparisons closer to the information-theoretic limit (ceiling(log2(N!))). I don't care much about #2 -- in experiments conducted when it was new, I measured the # of comparisons our samplesort hybrid did on random inputs, and it was never more than 2% over the theoretical lower bound, and typically closer. As N grows large, the expected case provably converges to the theoretical lower bound. There remains a vanishly small chance for a bad case, but nobody has reported one, and at the time I gave up trying to construct one. Back on Earth, among Python users the most frequent complaint I've heard is that list.sort() isn't stable. Alex is always quick to trot out the appropriate DSU (Decorate Sort Undecorate) pattern then, but the extra memory burden for that can be major (a new 2-tuple per list element costs about 32 bytes, then 4 more bytes for a pointer to it in a list, and 12 more bytes that don't go away to hold each non-small index). After reading all those papers, I couldn't resist taking a crack at a new algorithm that might be practical, and have something you might call a non-recursive adaptive stable natural mergesort / binary insertion sort hybrid. In playing with it so far, it has two bad aspects compared to our samplesort hybrid: + It may require temp memory, up to 2*N bytes worst case (one pointer each for no more than half the array elements). + It gets *some* benefit for arrays with many equal elements, but not nearly as much as I was able to hack samplesort to get. In effect, paritioning is very good at moving equal elements close to each other quickly, but merging leaves them spread across any number of runs. This is especially irksome because we're sticking to Py_LT for comparisons, so can't even detect a==b without comparing a and b twice (and then it's a deduction from that not a < b and not b < a). Given the relatively huge cost of comparisons, it's a timing disaster to do that (compare twice) unless it falls out naturally. It was fairly natural to do so in samplesort, but not at all in this sort. It also has good aspects: + It's stable (items that compare equal retain their relative order, so, e.g., if you sort first on zip code, and a second time on name, people with the same name still appear in order of increasing zip code; this is important in apps that, e.g., refine the results of queries based on user input). + The code is much simpler than samplesort's (but I think I can fix that ). + It gets benefit out of more kinds of patterns, and without lumpy special-casing (a natural mergesort has to identify ascending and descending runs regardless, and then the algorithm builds on just that). + Despite that I haven't micro-optimized it, in the random case it's almost as fast as the samplesort hybrid. In fact, it might have been a bit faster had I run tests yesterday (the samplesort hybrid got sped up by 1-2% last night). This one surprised me the most, because at the time I wrote the samplesort hybrid, I tried several ways of coding mergesorts and couldn't make it as fast. + It has no bad cases (O(N log N) is worst case; N-1 compares is best). Here are some typical timings, taken from Python's sortperf.py, over identical lists of floats: Key: *sort: random data \sort: descending data /sort: ascending data 3sort: ascending data but with 3 random exchanges ~sort: many duplicates =sort: all equal !sort: worst case scenario That last one was a worst case for the last quicksort Python had before it grew the samplesort, and it was a very bad case for that. By sheer coincidence, turns out it's an exceptionally good case for the experimental sort: samplesort i 2**i *sort \sort /sort 3sort ~sort =sort !sort 15 32768 0.13 0.01 0.01 0.10 0.04 0.01 0.11 16 65536 0.24 0.02 0.02 0.23 0.08 0.02 0.24 17 131072 0.54 0.05 0.04 0.49 0.18 0.04 0.53 18 262144 1.18 0.09 0.09 1.08 0.37 0.09 1.16 19 524288 2.58 0.19 0.18 2.34 0.76 0.17 2.52 20 1048576 5.58 0.37 0.36 5.12 1.54 0.35 5.46 timsort 15 32768 0.16 0.01 0.02 0.05 0.14 0.01 0.02 16 65536 0.24 0.02 0.02 0.06 0.19 0.02 0.04 17 131072 0.55 0.04 0.04 0.13 0.42 0.04 0.09 18 262144 1.19 0.09 0.09 0.25 0.91 0.09 0.18 19 524288 2.60 0.18 0.18 0.46 1.97 0.18 0.37 20 1048576 5.61 0.37 0.35 1.00 4.26 0.35 0.74 If it weren't for the ~sort column, I'd seriously suggest replacing the samplesort with this. 2*N extra bytes isn't as bad as it might sound, given that, in the absence of massive object duplication, each list element consumes at least 12 bytes (type pointer, refcount and value) + 4 bytes for the list pointer. Add 'em all up and that's a 13% worst-case temp memory overhead. From martin@v.loewis.de Sat Jul 20 09:59:55 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 20 Jul 2002 10:59:55 +0200 Subject: [Python-Dev] Where's time.daylight??? In-Reply-To: <200207200043.g6K0hMJ27043@pcp02138704pcs.reston01.va.comcast.net> References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> <200207200043.g6K0hMJ27043@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Can you check that in? I'm about to disappear to OSCON for a week. Done. I have no OSF/1 (aka whatever) system, so I can't really test whether it still helps on these systems. Regards, Martin From jacobs@penguin.theopalgroup.com Sat Jul 20 12:11:36 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Sat, 20 Jul 2002 07:11:36 -0400 (EDT) Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: On Sat, 20 Jul 2002, Tim Peters wrote: > After reading all those papers, I couldn't resist taking a crack at a new > algorithm that might be practical, and have something you might call a > non-recursive adaptive stable natural mergesort / binary insertion sort > hybrid. Great work, Tim! I've got several Python implementations of stable-sorts that I can now retire. > If it weren't for the ~sort column, I'd seriously suggest replacing the > samplesort with this. If duplicate keys cannot be more efficiently handled, why not add a list.stable_sort() method? That way the user gets to decide if they want the ~sort tax. If that case is fixed later, then there is little harm in having list.sort == list.stable_sort. > 2*N extra bytes isn't as bad as it might sound, given > that, in the absence of massive object duplication, each list element > consumes at least 12 bytes (type pointer, refcount and value) + 4 bytes for > the list pointer. Add 'em all up and that's a 13% worst-case temp memory > overhead. It doesn't bother me in the slightest (and I tend to sort big things). 13% is a reasonable trade-off for stability. Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From pinard@iro.umontreal.ca Sat Jul 20 13:24:45 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 20 Jul 2002 08:24:45 -0400 Subject: [Python-Dev] Re: Priority queue (binary heap) python code In-Reply-To: <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net> References: <20020624213318.A5740@arizona.localdomain> <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > Oh, and maybe we can borrow a few lines of François's description of > the algorithm. :-) Borrow liberally! I would prefer that nothing worth remains un-borrowed from mine, so I can happily get rid of my copy when the time comes! :-) > I propose to call it heapq.py. (Got a better name? Now or never.) I like `heapq' as it is not an English common name, like `heap' would be, so less likely to clash with user chosen variable names! This principle should be good in general. Sub-classing `heapq' from `list' is a good idea! P.S. - In other languages, I have been using `string' a lot, and this has been one of the minor irritations when I came to Python, that it forced me away of that identifier; so I'm now using `text' everywhere, instead. Another example is the name `socket', which is kind of reserved from the module name, I never really know how to name variables holding sockets :-). -- François Pinard http://www.iro.umontreal.ca/~pinard From ping@zesty.ca Sat Jul 20 13:32:41 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Sat, 20 Jul 2002 05:32:41 -0700 (PDT) Subject: [Python-Dev] The iterator story In-Reply-To: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> Message-ID: If you only have ten seconds read this: --------------------------------------- Guido, i believe i understand your position. My interpretation is: I'd like "iterate destructively" and "iterate non-destructively" to be spelled differently. You don't. I'd like to be able to establish conventions so that "x in y" doesn't destroy y. This isn't so important to you. We have a difference of opinion. I don't think we have a failure in understanding. If the opinions won't change, we might as well move on. I did not mean to waste your time, only to achieve understanding. Actual reply follows: --------------------- On Fri, 19 Jul 2002, Guido van Rossum wrote: > But I note that there are hybrids, and I think files (at least > seekable files) fall in the hybrid category. Indeed, files are unusual. In the particular way that i've chosen my definitions, though, classification of files is clear: files are not containers (there's no non-mutating read) and files are iterators (due to the behaviour of the read() method). Files aside, i do agree that hybrids exist. The dbm and tree examples you gave indeed mix container and iterator behaviour. I agree with you that mixing these things isn't usually a good design. In some cases you do end up providing both container-like and iterator-like interfaces. This is fine. But then when you use the object, you ought to be able to know which interface you are using. The argument in the "iterator story" message is that we should have a way to say "i want to use the non-destructive interface" and a way to say "i want to use the destructive interface". Depending what makes sense, one can choose to implement either interface, or both. > For example, while a tape file is a > container in the sense that reading the data doesn't destroy it, it's > very heavily geared towards sequential access, and you can't > realistically have two iterators going over the same tape at once. Indeed, you can't. But a tape file object is not a container (if we're using my definition), because the act of reading changes the tape file object -- it advances the tape. It's the same as file.read() -- even though file.read() doesn't mutate the data on the disk, it does mutate the file object, and that is what makes the file object not a container. It's precisely because tapes are too slow for practical random access that we would want a tape file object to provide an iterator-style interface and not provide a container-style interface. > If you're too young to remember Hee hee. I've used tapes. I've used *cassette* tapes, even. :) > > The issue is, should "for" be non-destructive? > > I don't see the benefit. We've done this for years and the only > conceptual problem was the abuse of __getitem__, not the > destructiveness of the for-loop. [...] > > The issue is, should "in" be non-destructive? > > If it can't be helped otherwise, sure, why not? Obviously we see these "problems" differently. Having "x in y" possibly destroy y is scary to me, but no big deal to you. All right. > > still produces "KeyError: 0"! This oughta be fixed...) > > Check the CVS logs. At one point before 2.2 was released, UserDict > has a __iter__ method. But then SF bug 448153 was filed, presenting > evidence that this broke previously working code. So a separate > class, IterableUserDict, was added that has the __iter__ method. Oh. :( Okay. Thanks for explaining. > There are a lot of objects that > have a way to return an iterators (old style using fake __getitem__, > and new ones using __iter__ and next) that are intended to be looped > over, once. I have no desire to deprecate this behavior, since (a) it > would be a major upheaval for the user community (a lot worse than > integer division), and (b) I don't see that "fixing" this prevents a > particular category of programming errors. As you can tell by now, i think it does prevent a certain category of errors. The general description is "mixing up mutating and non-mutating interfaces". The closest analogy i can think of is an alternate world in which "+" and "+=" had the same name, and the only way you could tell if the left operand would get mutated is by knowing the implementation of the left-hand object at runtime. Of course, in real Python you have to trust that the implementation "+" does not mutate. But at least we are able to set a convention, because "+" and "+=" are distinct operators. In the weird alternate world where "+" and "+=" are both written "+", you would have no hope of telling the difference. We'd look at "x + y" and say "Will x change? I don't know." And so it is with "for x in y": we'd look at that and say "Will y change? I don't know." We have no way of telling whether y is a container or an iterator, thus no way to establish a convention about what this should do. "for x in y" is polymorphic on y, but this is not how i think polymorphism is supposed to work. You could say you don't care whether y changes. (Well, you *are* saying you don't care.) Well, okay. I just want to make sure we both understand each other and see the issue at hand. If we do, then it just comes down to a difference of opinion about how significant a mixup this is, and so be it. > > I believe __iter__ is not a type flag. [...] > And I never said it was a type flag. I'm tired of repeating myself, > but you keep repeating this broken argument, so I have to keep > correcting you. I know you didn't say this. Please don't be offended. I apologize if i seemed to be wilfully ignoring you -- you don't have to repeat things many times in order to "drive home" your position to me. I was trying to summarize all the positions (not just yours), organize them, and explain them all at once. -- ?!ng From ping@zesty.ca Sat Jul 20 13:45:48 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Sat, 20 Jul 2002 05:45:48 -0700 (PDT) Subject: [Python-Dev] Re: The iterator story In-Reply-To: <20020719120043.A21503@glacier.arctrix.com> Message-ID: On Fri, 19 Jul 2002, Neil Schemenauer wrote: > First, people could implement __iter__ such that it returns an iterator > the mutates the original object (e.g. a file object __iter__ that > returns xreadlines). Yes, but then they would be violating the convention. The way things currently stand, we aren't even able to say what the convention *is*. > Second, it will be confusing to have two different ways of looping over > things. It's a difference in perspective. To me it seems confusing to have only one way of looping that might do two different things. But Guido basically agrees with you. (As in, destructive and non-destructive looping are not really that different; or, they are different but it's not worth the bother.) > Now I want to use this library but I have an iterator, not something > that implements __iter__. I would need to create a little wrapper with > a __iter__ method that returns my object. Yeah, that's seq(). > To summarize, I agree that "for" mutating the object can be surprising. The rub is, the only way for it to *not* be surprising is to have a way to *say* "loop destructively". If you can't express your expectations, there's no way to meet them. -- ?!ng From ping@zesty.ca Sat Jul 20 13:58:39 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Sat, 20 Jul 2002 05:58:39 -0700 (PDT) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: On Fri, 19 Jul 2002, Tim Peters wrote: > "for" did and does work in accord with a simple protocol, and whether that's > "destructive" depends on how the specific objects involved implement their > pieces of the protocol, not on the protocol itself. The same is true of all > of Python's hookable protocols. Name any protocol for which the question "does this mutate?" has no answer. (I ask you to accept that __call__ is a special case.) > What's so special about "for" that it > should pretend to deliver purely functional behavior in a highly > non-functional language? Who said anything about functional behaviour? I'm not requiring that looping *never* mutate. I just want to be able to tell *whether* it will. -- ?!ng From oren-py-d@hishome.net Sat Jul 20 13:58:51 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 20 Jul 2002 08:58:51 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> References: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020720125850.GA5862@hishome.net> > > Based on Guido's positive response, in which he asked me to make > > an addition to the PEP, i believe Guido agrees with me that > > __iter__ is distinct from the protocol of an iterator. This > > surprised me because it runs counter to the philosophy previously > > expressed in the PEP. > > I recognize that they are separate protocols. But because I like the > for-loop as a convenient way to get all of the elements of an > iterator, I want iterators to support __iter__. Is this the only reason iterators are required to support __iter__? It seems like a strange design decision to put the burden on all iterator implementers to write a dummy method returning self instead of just checking if tp_iter==NULL in PyObject_GetIter. It's like requiring all class writers to write a dummy __str__ method that calls __repr__ instead of implementing the automatic fallback to __repr__ in PyObject_Str when no __str__ is available. Oren From aahz@pythoncraft.com Sat Jul 20 14:00:01 2002 From: aahz@pythoncraft.com (Aahz) Date: Sat, 20 Jul 2002 09:00:01 -0400 Subject: [Python-Dev] Sorting In-Reply-To: References: Message-ID: <20020720130000.GA11845@panix.com> On Sat, Jul 20, 2002, Tim Peters wrote: > > If it weren't for the ~sort column, I'd seriously suggest replacing the > samplesort with this. 2*N extra bytes isn't as bad as it might sound, given > that, in the absence of massive object duplication, each list element > consumes at least 12 bytes (type pointer, refcount and value) + 4 bytes for > the list pointer. Add 'em all up and that's a 13% worst-case temp memory > overhead. Any reason the list object can't grow a .stablesort() method? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From oren-py-d@hishome.net Sat Jul 20 14:28:57 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 20 Jul 2002 09:28:57 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: <20020719162226.A22929@glacier.arctrix.com> References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> <20020719162226.A22929@glacier.arctrix.com> Message-ID: <20020720132857.GB5862@hishome.net> On Fri, Jul 19, 2002 at 04:22:26PM -0700, Neil Schemenauer wrote: > Neal Norwitz wrote: > > In what context? Were you iterating over a file or something else? > > I'm wondering if this is a problem, perhaps pychecker could generate > > a warning? > > I was switching between implementing something as a generator and > returning a list. I was curious why I was getting different behavior > until I realized I was iterating over the result twice. I don't > think pychecker could warn about such a bug. That's the scenario that bit me too. For me it was a little more difficult to find because it was wrapped in a few layers of chained transformations. I can't tell by the last element in the chain whether the first one is re-iterable or not. One approach to solve this is Ka-Ping Yee's proposal to specify in advance whether you are expecting an iterator or a re-iterable container using either 'for x in y' or 'for x from y'. I don't think this will work. There's already too much code that uses for x in y where y is an iterator. Another problem is that a transformation shouldn't care whether its upstream source is an iterator or an iterable - it's a generic reusable building block. My suggestion (which was rejected by Guido) was to raise an error when an iterator's .next() method is called afer it raises StopIteration. This way, if I try to iterate over the result again at least I'll get and error like "IteratorExhaustedError" instead something that is indistinguishable from an iterator of an empty container. I hate silent errors. This shouldn't be required from all iterator implementers but if all built-in iterators supported this (especially generators) it would help a lot to find such errors. Oren P.S. My definition of a transformation is a function taking one iterable argument and returning an iterator. It is usually implemented as a generator function. From oren-py-d@hishome.net Sat Jul 20 14:39:26 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 20 Jul 2002 09:39:26 -0400 Subject: [Python-Dev] Re: The iterator story In-Reply-To: References: <20020719120043.A21503@glacier.arctrix.com> Message-ID: <20020720133926.GC5862@hishome.net> On Sat, Jul 20, 2002 at 05:45:48AM -0700, Ka-Ping Yee wrote: > > To summarize, I agree that "for" mutating the object can be surprising. > > The rub is, the only way for it to *not* be surprising is to have a > way to *say* "loop destructively". If you can't express your > expectations, there's no way to meet them. It doesn't seem very useful to say "loop destructively" - in these cases I don't usually care whether it's destructive or not. It is useful, though, to be able to say "loop INdestructively". That's how I do it: def reiter(obj): """ Return an object's iterator, raise exception if object does not appear to support multiple iterations """ assert not isintance(obj, file) itr = iter(obj) assert itr is not obj return itr Oren From aahz@pythoncraft.com Sat Jul 20 15:09:23 2002 From: aahz@pythoncraft.com (Aahz) Date: Sat, 20 Jul 2002 10:09:23 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: <20020720132857.GB5862@hishome.net> References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> <20020719162226.A22929@glacier.arctrix.com> <20020720132857.GB5862@hishome.net> Message-ID: <20020720140923.GA18716@panix.com> On Sat, Jul 20, 2002, Oren Tirosh wrote: > On Fri, Jul 19, 2002 at 04:22:26PM -0700, Neil Schemenauer wrote: >> Neal Norwitz wrote: >>> >>> In what context? Were you iterating over a file or something else? >>> I'm wondering if this is a problem, perhaps pychecker could generate >>> a warning? >> >> I was switching between implementing something as a generator and >> returning a list. I was curious why I was getting different behavior >> until I realized I was iterating over the result twice. I don't >> think pychecker could warn about such a bug. > > That's the scenario that bit me too. For me it was a little more difficult > to find because it was wrapped in a few layers of chained transformations. > I can't tell by the last element in the chain whether the first one is > re-iterable or not. > > My suggestion (which was rejected by Guido) was to raise an error when an > iterator's .next() method is called afer it raises StopIteration. This > way, if I try to iterate over the result again at least I'll get and error > like "IteratorExhaustedError" instead something that is indistinguishable > from an iterator of an empty container. I hate silent errors. I'm still not understanding how this would help. When a chainable transformer gets StopIteration, it should immediately return. What else do you want to do? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Sat Jul 20 15:10:57 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 20 Jul 2002 10:10:57 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: Your message of "Sat, 20 Jul 2002 05:32:41 PDT." References: Message-ID: <200207201410.g6KEAvY29349@pcp02138704pcs.reston01.va.comcast.net> > If you only have ten seconds read this: > --------------------------------------- > > Guido, i believe i understand your position. My interpretation is: > > I'd like "iterate destructively" and "iterate non-destructively" > to be spelled differently. You don't. > > I'd like to be able to establish conventions so that "x in y" > doesn't destroy y. This isn't so important to you. > > We have a difference of opinion. I don't think we have a failure in > understanding. If the opinions won't change, we might as well move on. > I did not mean to waste your time, only to achieve understanding. Aye, aye, Sir. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 20 15:13:34 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 20 Jul 2002 10:13:34 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: Your message of "Sat, 20 Jul 2002 08:58:51 EDT." <20020720125850.GA5862@hishome.net> References: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> <20020720125850.GA5862@hishome.net> Message-ID: <200207201413.g6KEDYh29370@pcp02138704pcs.reston01.va.comcast.net> > > > Based on Guido's positive response, in which he asked me to make > > > an addition to the PEP, i believe Guido agrees with me that > > > __iter__ is distinct from the protocol of an iterator. This > > > surprised me because it runs counter to the philosophy previously > > > expressed in the PEP. > > > > I recognize that they are separate protocols. But because I like the > > for-loop as a convenient way to get all of the elements of an > > iterator, I want iterators to support __iter__. > > Is this the only reason iterators are required to support __iter__? Yes. > It seems like a strange design decision to put the burden on all iterator > implementers to write a dummy method returning self instead of just checking > if tp_iter==NULL in PyObject_GetIter. It's like requiring all class writers > to write a dummy __str__ method that calls __repr__ instead of implementing > the automatic fallback to __repr__ in PyObject_Str when no __str__ is > available. I suppose you meant "check for tp_iter==NULL and tp_iternext!=NULL. --Guido van Rossum (home page: http://www.python.org/~guido/) From cce@clarkevans.com Sat Jul 20 17:21:01 2002 From: cce@clarkevans.com (Clark C . Evans) Date: Sat, 20 Jul 2002 12:21:01 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Fri, Jul 19, 2002 at 05:10:45PM -0400 References: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020720122101.A38901@doublegemini.com> On Fri, Jul 19, 2002 at 05:10:45PM -0400, Guido van Rossum wrote: | > The __iter__-On-Iterators Issue: | > | > Some people have mentioned that the presence of an __iter__() | > method is a way of signifying that an object supports the | > iterator protocol. It has been said that this is necessary | > because the presence of a "next()" method is not sufficiently | > distinguishing. | | Not me. As I remember the debate last year, Ping is expressing the concensus which was reached. This issue was tied directly, although not so articulately, to the namespace collision issue. I remember being concerned about next() not having leading and trailing __ but my concerns were put to rest knowing that every iterator had to have a __iter__ such that __iter__ returned self. I wasn't on the list for that long due to time constraints, but this linkage was there at least for me. | > The iteration method is currently called "next()". | > | > Previous candidates for the name of this method were "next", | > "__next__", and "__call__". After some previous debate, | > it was pronounced to be "next()". | > | > There are concerns that "next()" might collide with existing | > methods named "next()". There is also a concern that "next()" | > is inconsistent because it is the only type-slot-method that | > does not have a __special__ name. | > | > The issue is, should it be called "next" or "__next__"? | | That's a separate issue, and cleans up only a small wart that in | practice hasn't hurt anybody AFAIK. Today/tomorow I'll finish peicing together the survey so that it clearly articulates the issue (and I'll be sure to note that you are against the idea). Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software From neal@metaslash.com Sat Jul 20 17:52:49 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sat, 20 Jul 2002 12:52:49 -0400 Subject: [Python-Dev] Where's time.daylight??? References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> <200207200043.g6K0hMJ27043@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D399561.A474C77A@metaslash.com> "Martin v. Loewis" wrote: > > Can you check that in? I'm about to disappear to OSCON for a week. > > Done. I have no OSF/1 (aka whatever) system, so I can't really test > whether it still helps on these systems. It doesn't work on dec^w alpha^w compaq ... I've got an autoconf patch which works on Linux & OSF: http://python.org/sf/584245 There are some test failures I will look at later: test test_dl crashed -- exceptions.SystemError: module dl requires sizeof(int) == sizeof(long) == sizeof(char*) test test_nis crashed -- exceptions.SystemError: error return without exception set test_pwd may have hung which is the last test run Neal From tim.one@comcast.net Sun Jul 21 04:26:44 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 20 Jul 2002 23:26:44 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: Quick update. I left off here: samplesort i 2**i *sort \sort /sort 3sort ~sort =sort !sort 15 32768 0.13 0.01 0.01 0.10 0.04 0.01 0.11 16 65536 0.24 0.02 0.02 0.23 0.08 0.02 0.24 17 131072 0.54 0.05 0.04 0.49 0.18 0.04 0.53 18 262144 1.18 0.09 0.09 1.08 0.37 0.09 1.16 19 524288 2.58 0.19 0.18 2.34 0.76 0.17 2.52 20 1048576 5.58 0.37 0.36 5.12 1.54 0.35 5.46 timsort 15 32768 0.16 0.01 0.02 0.05 0.14 0.01 0.02 16 65536 0.24 0.02 0.02 0.06 0.19 0.02 0.04 17 131072 0.55 0.04 0.04 0.13 0.42 0.04 0.09 18 262144 1.19 0.09 0.09 0.25 0.91 0.09 0.18 19 524288 2.60 0.18 0.18 0.46 1.97 0.18 0.37 20 1048576 5.61 0.37 0.35 1.00 4.26 0.35 0.74 With a lot of complication (albeit principled complication), timsort now looks like 15 32768 0.14 0.01 0.01 0.04 0.10 0.01 0.02 16 65536 0.24 0.02 0.02 0.05 0.17 0.02 0.04 17 131072 0.54 0.05 0.04 0.13 0.38 0.04 0.09 18 262144 1.18 0.09 0.09 0.24 0.81 0.09 0.18 19 524288 2.57 0.18 0.18 0.46 1.77 0.18 0.37 20 1048576 5.55 0.37 0.35 0.99 3.81 0.35 0.74 on the same data (tiny improvements in *sort and 3sort, significant improvement in ~sort, huge improvements for some patterns that aren't touched by this test). For contrast and a sanity check, I also implemented Edelkamp and Stiegeler's "Next-to-m" refinement of weak heapsort. If you know what heapsort is, this is weaker . In the last decade, Dutton had the bright idea that a heap is stronger than you need for sorting: it's enough if you know only that a parent node's value dominates the right child's values, and then ensure that the root node has no left child. That implies the root node has the maximum value in the (weak) heap. It doesn't matter what's in the left child for the other nodes, provided only that they're weak heaps too. The weaker requirements allow faster (but trickier) code for maintaining the weak-heap invariant as sorting proceeds, and in particular it requires far fewer element comparisons than a (strong)heap sort. Edelkamp and Stiegeler complicated this algorithm in several ways to cut the comparisons even more. I stopped at their first refinement, which does a worst-case number of comparisons N*k - 2**k + N - 2*k where k = ceiling(logbase2(N)) so that even the worst case is very good. They have other gimmicks to cut it more (we're close to the theoretical limit here, so don't read too much into "more"!), but the first refinement proved so far from being promising that I dropped it: weakheapsort i 2**i *sort \sort /sort 3sort ~sort =sort !sort 15 32768 0.19 0.12 0.11 0.11 0.11 0.11 0.12 16 65536 0.31 0.26 0.23 0.23 0.24 0.23 0.26 17 131072 0.71 0.55 0.49 0.49 0.51 0.48 0.56 18 262144 1.59 1.15 1.03 1.04 1.08 1.02 1.19 19 524288 3.57 2.43 2.18 2.18 2.27 2.14 2.51 20 1048576 8.01 5.08 4.57 4.58 4.77 4.50 5.29 The number of compares isn't the problem with this. The problem appears to be heapsort's poor cache behavior, leaping around via multiplying and dividing indices by 2. This is exacerbated in weak heapsort because it also requires allocating a bit vector, to attach a "which of my children should I think of as being 'the right child'?" flag to each element, and that also gets accessed in the same kinds of cache-hostile ways at the same time. The samplesort and mergesort variants access memory sequentially. What I haven't accounted for is why weakheapsort appears to get a major benefit from *any* kind of regularity in the input -- *sort is always the worst case on each line, and by far (note that this implementation does no special-casing of any kind, so it must be an emergent property of the core algorithm). If I were a researcher, I bet I could get a good paper out of that . From tim.one@comcast.net Sun Jul 21 06:19:03 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 01:19:03 -0400 Subject: [Python-Dev] Sorting In-Reply-To: <20020720130000.GA11845@panix.com> Message-ID: [Aahz] > Any reason the list object can't grow a .stablesort() method? I'm not sure. Python's samplesort implementation is right up there among the most complicated (by any measure) algorithms in the code base, and the mergesort isn't any simpler anymore. Yet another large mass of difficult code can make for a real maintenance burden after I'm dead. Here, guess what this does: static int gallop_left(PyObject *pivot, PyObject** p, int n, PyObject *compare) { int k; int lo, hi; PyObject **pend; assert(pivot && p && n); pend = p+(n-1); lo = 0; hi = -1; for (;;) { IFLT(*(pend - lo), pivot) break; hi = lo; lo = (lo << 1) + 1; if (lo >= n) { lo = n; break; } } lo = n - lo; hi = n-1 - hi; while (lo < hi) { int m = (lo + hi) >> 1; IFLT(p[m], pivot) lo = m+1; else hi = m; } return lo; fail: return -1; } There are 12 other functions that go into this, some less obscure, some more. Change "hi = -1" to "hi = 0" and you'll get a core dump, etc; it's exceedingly delicate, and because truly understanding it essentially requires doing a formal correctness proof, it's difficult to maintain; fight your way to that understanding, and you'll know why it sorts, but still won't have a clue about why it's so fast. I'm disinclined to add more code of this nature unless I can use it to replace code at least as difficult (which samplesort is). An irony is that stable sorts are, by definition, pointless unless you *do* have equal elements, and the many-equal-elements case is the one known case where the new algorithm is much slower than the current one (indeed, I have good reason to suspect it's the only such case, and reasons beyond just that God loves a good joke ). It's OK by me if this were to become Python's only sort. Short of that, I'd be happier contributing the code to a sorting extension module. There are other reasons the latter may be a good idea; e.g., if you know you're sorting C longs, it's not particularly difficult to do that 10x faster than Python's generic list.sort() can do it; ditto if you know you're comparing strings; etc. Exposing the binary insertion sort (which both samplesort and mergesort use) would also be useful to some people (it's a richer variant of bisect.insort_right). I'd prefer that Python-the-language have just one "really good general sort" built in. From oren-py-d@hishome.net Sun Jul 21 06:33:40 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sun, 21 Jul 2002 08:33:40 +0300 Subject: [Python-Dev] The iterator story In-Reply-To: <20020720140923.GA18716@panix.com>; from aahz@pythoncraft.com on Sat, Jul 20, 2002 at 10:09:23AM -0400 References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> <20020719162226.A22929@glacier.arctrix.com> <20020720132857.GB5862@hishome.net> <20020720140923.GA18716@panix.com> Message-ID: <20020721083340.A13156@hishome.net> On Sat, Jul 20, 2002 at 10:09:23AM -0400, Aahz wrote: > > That's the scenario that bit me too. For me it was a little more difficult > > to find because it was wrapped in a few layers of chained transformations. > > I can't tell by the last element in the chain whether the first one is > > re-iterable or not. > > > > My suggestion (which was rejected by Guido) was to raise an error when an > > iterator's .next() method is called afer it raises StopIteration. This > > way, if I try to iterate over the result again at least I'll get and error > > like "IteratorExhaustedError" instead something that is indistinguishable > > from an iterator of an empty container. I hate silent errors. > > I'm still not understanding how this would help. When a chainable > transformer gets StopIteration, it should immediately return. What else > do you want to do? The tranformations are fine the way they are. The problem is the source - if the source is an exhausted iterator and you ask it for a new iterator it will happily return itself and report StopIteration on each .next(). This behavior is indistringuishable from a valid iterator on an empty container. What I would like is for iterators to return StopIteration exactly once and then switch to a different exception. This way the transformations will not need to care whether their upstream source is restartable or not - the exception will propagate through the entire chain and notify the consumer at the end of the chain that the source at the beginning of the chain is not re-iterable. I'm not suggesting that all iterator implementers much do this - having it on just the builtin iterators will be a great help. Right now I am using tricks like special-casing files and checking if iter(x) is x. It works but I hate it. Oren From oren-py-d@hishome.net Sun Jul 21 06:40:14 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sun, 21 Jul 2002 08:40:14 +0300 Subject: [Python-Dev] The iterator story In-Reply-To: <200207201413.g6KEDYh29370@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Sat, Jul 20, 2002 at 10:13:34AM -0400 References: <200207192110.g6JLAjU15146@pcp02138704pcs.reston01.va.comcast.net> <20020720125850.GA5862@hishome.net> <200207201413.g6KEDYh29370@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020721084014.A13189@hishome.net> On Sat, Jul 20, 2002 at 10:13:34AM -0400, Guido van Rossum wrote: > > It seems like a strange design decision to put the burden on all iterator > > implementers to write a dummy method returning self instead of just checking > > if tp_iter==NULL in PyObject_GetIter. It's like requiring all class writers > > to write a dummy __str__ method that calls __repr__ instead of implementing > > the automatic fallback to __repr__ in PyObject_Str when no __str__ is > > available. > > I suppose you meant "check for tp_iter==NULL and tp_iternext!=NULL. Yes. Any comments on my analogy of __iter__/next with __str__/__repr__ and the burden of implementation? Oren From tim.one@comcast.net Sun Jul 21 06:38:17 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 01:38:17 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: [Ping] > Name any protocol for which the question "does this mutate?" has > no answer. Heh -- you must not use Zope much <0.6 wink>. I'm hard pressed to think of a protocol where that does have a reliable answer. Here: x1 = y.z x2 = y.z Are x1 and x2 the same object after that? At least equal? Did either line mutate y? You simply can't know without knowing how y's type implements __getattr__, and with the introduction of computed attributes (properties) it's just going to get muddier. > (I ask you to accept that __call__ is a special case.) It's not to me -- if a protocol invokes user-defined Python code, there's nothing you can say about mutability "in general", and people do both use and abuse that. >> What's so special about "for" that it should pretend to deliver >> purely functional behavior in a highly non-functional language? > Who said anything about functional behaviour? I'm not requiring that > looping *never* mutate. I just want to be able to tell *whether* it > will. I don't blame you, and sometimes I'd like to know whether y.z (or "y += z", etc) mutates y too. It cuts deeper than loops, so a loop-focused gimmick seems inadequate to me (provided "something needs to be done about it" at all -- I'm not sure, but doubt it). From tim.one@comcast.net Sun Jul 21 06:55:00 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 01:55:00 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <554A408C-9B5F-11D6-9B6B-003065517236@oratrix.com> Message-ID: [Jack Jansen] > Oh, it's MarkH appreciation that's wanted! In that case I'll > gladly chime in, I was was afraid it was __declspec(dllexport) > appreciation. Mark is one cool dude who knows where his towel is! > > 199998 to go. Should we start taking a poll who'll be the next > python-devver we start appreciating when the counter hits zero? It would have been you, Jack, except Mark was much cleverer about this. You make the Mac support so invisible to the rest of us that the only thing we can ever thank you for is stopping refcount abuse of immortal strings. Mark put some sort of Windows gimmick on 79% of the lines in the whole code base, thus ensuring a never-ending supply of reasons to thank him for getting rid of it one line at a time . i-demand-that-everyone-appreciate-jack-more-too-ly y'rs - tim From smurf@noris.de Sun Jul 21 09:29:30 2002 From: smurf@noris.de (Matthias Urlichs) Date: Sun, 21 Jul 2002 10:29:30 +0200 Subject: [Python-Dev] Priority queue (binary heap) python code Message-ID: Oren Tirosh : > When I want to sort a list I just use .sort(). I don't care which algorithm > is used. The point in this discussion, though, is that frequently you don't need a sorted list. You just need a list which yields all elements in order when you pop them. Heaps are a nice low-overhead implementation of that idea, and therefore should be in the standard library. -- Matthias Urlichs From pinard@iro.umontreal.ca Sun Jul 21 11:26:55 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 21 Jul 2002 06:26:55 -0400 Subject: [Python-Dev] Re: Priority queue (binary heap) python code In-Reply-To: References: Message-ID: [Matthias Urlichs] > Oren Tirosh : > > When I want to sort a list I just use .sort(). I don't care which > > algorithm is used. > The point in this discussion, though, is that frequently you don't need > a sorted list. You just need a list which yields all elements in order > when you pop them. Heaps are a nice low-overhead implementation of that > idea, and therefore should be in the standard library. This is especially true when you need only the first few elements from the sorted set, which is a pretty common case in practice. A blind sort is not always the optimal solution, when you want to spare some CPU time. A caricatural example of abuse would be to implement `max' as `sort' followed by peeking at the first element of the result. Heaps are also an efficient enough representation if you insert while sorting, as it often happens in simulations. Someone I know studied this intensely, and came up with better algorithms on average of his reference benchmark, but with much worse worst cases -- so it depends of the characteristics of the simulation. Heaps do quite well on average, and do acceptably well also in their worst cases. -- François Pinard http://www.iro.umontreal.ca/~pinard From aahz@pythoncraft.com Sun Jul 21 14:25:50 2002 From: aahz@pythoncraft.com (Aahz) Date: Sun, 21 Jul 2002 09:25:50 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: References: <554A408C-9B5F-11D6-9B6B-003065517236@oratrix.com> Message-ID: <20020721132550.GC25525@panix.com> On Sun, Jul 21, 2002, Tim Peters wrote: > > i-demand-that-everyone-appreciate-jack-more-too-ly y'rs - tim My iBook and OSCON class members thank Jack. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From ping@zesty.ca Sun Jul 21 14:51:30 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 21 Jul 2002 06:51:30 -0700 (PDT) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: On Sun, 21 Jul 2002, Tim Peters wrote: > x1 = y.z > x2 = y.z > > Are x1 and x2 the same object after that? At least equal? Did either line > mutate y? You simply can't know without knowing how y's type implements > __getattr__, and with the introduction of computed attributes (properties) > it's just going to get muddier. That's not the point. You could claim that *any* polymorphism in Python is useless by the same argument. But Python is not useless; Python code really is reusable; and that's because there are good conventions about what the behaviour *should* be. People who do really find this upsetting should go use a strongly-typed language. In general, getting "y.z" should be idempotent, and should not mutate y. I think everyone would agree on the concept. If it does mutate y with visible effects, then the implementor is breaking the convention. Sure, Python won't prevent you from writing a file-like class where you write the string "blah" to the file by fetching f.blah and you close the file by mentioning f[42]. But when users of this class then come running after you with pointed sticks, i'm not going to fight them off. :) This is a list of all the type slots accessible from Python, before iterators (i.e. pre-2.2). Beside each is the answer to the question: Suppose you look at the value of x, then do this operation to x, then look at the value of x. Should we expect the two observed values to be the same or different? nb_add same nb_subtract same nb_multiply same nb_divide same nb_remainder same nb_divmod same nb_power same nb_negative same nb_positive same nb_absolute same nb_nonzero same nb_invert same nb_lshift same nb_rshift same nb_and same nb_xor same nb_or same nb_coerce same nb_int same nb_long same nb_float same nb_oct same nb_hex same nb_inplace_add different nb_inplace_subtract different nb_inplace_multiply different nb_inplace_divide different nb_inplace_remainder different nb_inplace_power different nb_inplace_lshift different nb_inplace_rshift different nb_inplace_and different nb_inplace_xor different nb_inplace_or different nb_floor_divide same nb_true_divide same nb_inplace_floor_divide different nb_inplace_true_divide different sq_length same sq_concat same sq_repeat same sq_item same sq_slice same sq_ass_item different sq_ass_slice different sq_contains same sq_inplace_concat different sq_inplace_repeat different mp_length same mp_subscript same mp_ass_subscript different bf_getreadbuffer same bf_getwritebuffer same bf_getsegcount same bf_getcharbuffer same tp_print same tp_getattr same tp_setattr different tp_compare same tp_repr same tp_hash same tp_call ? tp_str same tp_getattro same tp_setattro different In every case except for __call__, there exists a canonical answer. We all rely on these conventions every time we write a Python program. And learning these conventions is a necessary part of learning Python. You can argue, as Guido has, that in the particular case of for-loops distinguishing between mutating and non-mutating behaviour is not worth the trouble. But you can't say that we should give up on the whole concept *in general*. -- ?!ng From aahz@pythoncraft.com Sun Jul 21 15:41:08 2002 From: aahz@pythoncraft.com (Aahz) Date: Sun, 21 Jul 2002 10:41:08 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: <20020721083340.A13156@hishome.net> References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> <20020719162226.A22929@glacier.arctrix.com> <20020720132857.GB5862@hishome.net> <20020720140923.GA18716@panix.com> <20020721083340.A13156@hishome.net> Message-ID: <20020721144108.GA5608@panix.com> On Sun, Jul 21, 2002, Oren Tirosh wrote: > On Sat, Jul 20, 2002 at 10:09:23AM -0400, Aahz wrote: >>Oren: >>> >>> That's the scenario that bit me too. For me it was a little more >>> difficult to find because it was wrapped in a few layers of chained >>> transformations. I can't tell by the last element in the chain >>> whether the first one is re-iterable or not. >>> >>> My suggestion (which was rejected by Guido) was to raise an >>> error when an iterator's .next() method is called afer it raises >>> StopIteration. This way, if I try to iterate over the result again >>> at least I'll get and error like "IteratorExhaustedError" instead >>> something that is indistinguishable from an iterator of an empty >>> container. I hate silent errors. >> >> I'm still not understanding how this would help. When a chainable >> transformer gets StopIteration, it should immediately return. What >> else do you want to do? > > The tranformations are fine the way they are. The problem is the > source - if the source is an exhausted iterator and you ask it for a > new iterator it will happily return itself and report StopIteration > on each .next(). This behavior is indistringuishable from a valid > iterator on an empty container. So the problem lies in asking the source for a new iterator, not in trying to use it. Making the iterator consumer responsible for handling this seems like the wrong approach to me -- the consumer *shouldn't* be able to tell the difference. If you're breaking that paradigm, you don't actually have an iterator consumer, you've got something else that wants to use the iterator interface, *plus* some additional features. The way Python normally handles issues like this is through documentation. (I.e., if your consumer requires an iterable capable of producing multiple iterators rather than an iterator object, you document that.) > Right now I am using tricks like special-casing files and checking if > iter(x) is x. It works but I hate it. You need to write your own wrapper or change the way your consumer works. Special-casing files inside your consumer is a Bad Idea. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From tim.one@comcast.net Sun Jul 21 21:14:43 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 16:14:43 -0400 Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: [Ping] > Name any protocol for which the question "does this mutate?" has > no answer. [Tim] >> Heh -- you must not use Zope much <0.6 wink>. I'm hard pressed to >> think of a protocol where that does have a reliable answer. Here: >> >> x1 = y.z >> x2 = y.z >> >> Are x1 and x2 the same object after that? At least equal? Did >> either line mutate y? You simply can't know without knowing how y's >> type implements __getattr__, and with the introduction of computed >> attributes (properties) it's just going to get muddier. [Ping] > That's not the point. It answered the question you asked. > You could claim that *any* polymorphism in Python is useless by the > same argument. It's your position that "for" is semi-useless because of the possibility for mutation. That isn't my position, and that some people write mutating __getattr__ (etc) doesn't make y.z (etc) unattractive to me either. > But Python is not useless; Python code really is reusable; Provided you play along with code's often-undocumented preconditions, absolutely. > and that's because there are good conventions about what the behaviour > *should* be. People who do really find this upsetting should go use a > strongly-typed language. Sorry, I couldn't follow this part. It's a fact that mutating __getattr__ (etc) implementations exist, and it's a fact that I'm not much bothered by it. I don't suggest they move to a different language, either (assuming that by "strongly-typed" you meant "statically typed" -- Python is already strongly typed). > In general, getting "y.z" should be idempotent, and should not mutate y. > I think everyone would agree on the concept. If it does mutate y with > visible effects, then the implementor is breaking the convention. No argument, although I have to emphasize that it's *just* "a convention", and repeat my prediction that the introduction of properties is going to make this particular convention less reliable in real life over time. > Sure, Python won't prevent you from writing a file-like class where you > write the string "blah" to the file by fetching f.blah and you close the > file by mentioning f[42]. But when users of this class then come running > after you with pointed sticks, i'm not going to fight them off. :) While properties aren't going to stop you from saying self.transactionid = self.session_manager.newid and get a new result each time you do it. Spelling no-argument method calls without parens is popular in some other languages, and it's "a feature" that properties make that easy to spell in Python 2.2 too. > This is a list of all the type slots accessible from Python, before > iterators (i.e. pre-2.2). Beside each is the answer to the question: > > Suppose you look at the value of x, then do this operation to x, > then look at the value of x. Should we expect the two observed > values to be the same or different? > ... I don't know why you're bothering with this, but it's got holes. For example, some people overly fond of C++ enjoy overloading "<<" in highly non-functional ways. For another, the section on the inplace operators seems confused; after x1 = x x += y there's no single best answer to whether x is x1 is true, or to whether the value of x1 before is == to the value of x1 after. The most popular convention*s* for the inplace operators are - If x is of a mutable type, then x is x1 after, and the pre- and post- values of x1 are !=. - If x is of an immutable type, then x is not x1 after, and the pre- and post- values of x1 are ==. The second case is forced, but the first one isn't. In light of all that, the intended meaning of "different" in > nb_inplace_add different is either incorrect, or so weak that it's not worth much. I suppose you mean that, in Python code x += y the object bound to the name "x" before the operation most likely has a different (!=) value than the object bound to the name "x" after the operation. That's true, but relies on what the generated code does *with* the result of nb_inplace_add. If you just call the method x.__iadd__(y) there's simply no guessing whether x is "different" as a result (it never is for x of an immutable type, it usually is for x of a mutable type, and there's no way to tell the difference just by staring at x). > nb_hex same I sure hope so . > ... > In every case except for __call__, there exists a canonical answer. If by "canonical" you mean "most common", sure, with at least the exceptions noted above. > We all rely on these conventions every time we write a Python program. > And learning these conventions is a necessary part of learning Python. > > You can argue, as Guido has, that in the particular case of for-loops > distinguishing between mutating and non-mutating behavior is not worth > the trouble. But you can't say that we should give up on the whole > concept *in general*. To the contrary, in a language with state it's crucial for the programmer to know when they're mutating state. If you use a mutating __getattr__, you better be careful that the code you call doesn't rely on __getattr__ not mutating; if you use an iterator object, you better be careful that the code you call doesn't require something stronger than an iterator object. It's all the same to me, and as Guido repeated until he got tired of it, the possibility for "for" and "x in y" (etc) to mutate has always been there, and has always been used. I didn't and still don't have any notable real-life problems dealing with this, although I too have gotten bit when passing a generator-iterator to code that required a sequence. I suppose the difference is that I said "oops! I screwed up!", fixed it, and moved on. It would have helped most if Python had a scheme for declaring and enforcing interfaces, *and* I bothered to use it (doubtful); second-most if the docs for the callee had spelled out its preconditions better; I doubt it would have helped at all if a variant spelling of "for" had been used, because I didn't eyeball the body of the callee first. As is, I just stuffed the generator-iterator object inside tuple() at the call site, and everything was peachy. That took a lot less effort than reading this thread <0.9 wink>. From tim.one@comcast.net Sun Jul 21 21:17:46 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 16:17:46 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: <20020721132550.GC25525@panix.com> Message-ID: [Tim] > i-demand-that-everyone-appreciate-jack-more-too-ly y'rs - tim [Aahz] > My iBook and OSCON class members thank Jack. Great! You're the most appreciate guy we've got here, Aahz. I demand that everyone appreciate you more too! starting-now-ly y'rs - tim From tim.one@comcast.net Sun Jul 21 22:04:02 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 17:04:02 -0400 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: Message-ID: [Martin v. Loewis] > If that is the platform convention, I see no problem following > it. Windows already does things differently from Unix, by using the > registry to compute sys.path. FYI, this is mostly a myth. In normal operation for most people, Python never gets any info out of the Windows registry. The Python path in the registry is consulted only in unusual situations, when the Python library can't be found under the directory of the executable that called the sys.path-setting code. This can happen when, e.g., Python is embedded in some other app. The process is quite involved; the comment block at the top of PC/getpathp.c is a good summary. When reading it, note that there normally aren't any "application paths" in the registry; e.g., the PLabs Windows installer doesn't create any such beast. From tdelaney@avaya.com Mon Jul 22 00:23:55 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 22 Jul 2002 09:23:55 +1000 Subject: [Python-Dev] Single- vs. Multi-pass iterability Message-ID: > From: Ka-Ping Yee [mailto:ping@zesty.ca] > > It's just not the way i expect for-loops to work. Perhaps we would > need to survey people for objective data, but i feel that most people > would be surprised if > > for x in y: print x > for x in y: print x > > did not print the same thing twice, or if > > if x in y: print 'got it' > if x in y: print 'got it' > > did not do the same thing twice. I realize this is my own opinion, > but it's a fairly strong impression i have. > > Well, for a generator, there is no underlying sequence. > > while 1: print next(gen) > > makes it clear that there is no sequence, but > > for x in gen: print x > > seems to give me the impression that there is. I think this is the crux of the matter. You see for: loops as inherently non-destructive - that they operate on containers. I (and presumably Guido, though I would never presume to channel him ;) see for: loops as inherently destructive - that they operate on iterators. That they obtain an iterator from a container (if possible) is a useful convenience. Perhaps the terminology is confusing. Consider a queue. for each person in the queue: service the person Is there anyone who would *not* consider this to be destructive (of the queue)? Tim Delaney From kevin@koconnor.net Mon Jul 22 00:30:57 2002 From: kevin@koconnor.net (Kevin O'Connor) Date: Sun, 21 Jul 2002 19:30:57 -0400 Subject: [Python-Dev] Priority queue (binary heap) python code In-Reply-To: <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Sat, Jul 20, 2002 at 02:06:29AM -0400 References: <20020624213318.A5740@arizona.localdomain> <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020721193057.A1891@arizona.localdomain> On Sat, Jul 20, 2002 at 02:06:29AM -0400, Guido van Rossum wrote: > > Any chance something like this could make it into the standard python > > library? It would save a lot of time for lazy people like myself. :-) > > > > I have read (or at least skimmed) this entire thread now. After I > reconstructed the algorithm in my head, I went back to Kevin's code; I > admire the compactness of his code. I believe that this would make a > good addition to the standard library, as a friend of the bisect > module. Thanks! >The only change I would make would be to make heap[0] the > lowest value rather than the highest. I agree this appears more natural, but a priority queue that pops the lowest priority item is a bit odd. > I propose to call it heapq.py. (Got a better name? Now or never.) > > [*] Afterthought: this could be made into an new-style class by adding > something like this to the end of module: Looks good to me. Thanks again, -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------ From tdelaney@avaya.com Mon Jul 22 00:40:24 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Mon, 22 Jul 2002 09:40:24 +1000 Subject: [Python-Dev] Priority queue (binary heap) python code Message-ID: > From: Kevin O'Connor [mailto:kevin@koconnor.net] > On Sat, Jul 20, 2002 at 02:06:29AM -0400, Guido van Rossum wrote: > > >The only change I would make would be to make heap[0] the > > lowest value rather than the highest. > > I agree this appears more natural, but a priority queue that pops the > lowest priority item is a bit odd. I'm in two minds about this. My first thought is that the *first* item (heap[0]) should be the highest priority. OTOH, if it were a sorted list, list[0] would return the *lowest* priority. So i think for consistency heap[0] must return the lowest priority. Tim Delaney From greg@cosc.canterbury.ac.nz Mon Jul 22 01:20:12 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jul 2002 12:20:12 +1200 (NZST) Subject: [Python-Dev] The iterator story In-Reply-To: <20020719120043.A21503@glacier.arctrix.com> Message-ID: <200207220020.g6M0KCM21823@oma.cosc.canterbury.ac.nz> > Should people prefer to write: > > for item from iterator: > do something > > when they only need to loop over something once? This shows up a problem with Ping's proposal, I think: The place where you write the for-loop isn't the place where you know whether something will be iterated over more than once or not. How is a library routine going to know whether a sequence passed to it is going to be used again later? It's impossible -- global knowledge of the whole program is needed. This appears to leave the library writer with two choices: (1) Use for-in, to be on the safe side, in case the user doesn't want the sequence destroyed -- but then it can't be used on a destructive iterator, even if the caller knows he won't be using it again; (2) use for-from, and force everyone who calls it to adapt sequences to iterators before calling. Either way, things get messy and complicated and possibly dangerous. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Jul 22 02:50:49 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jul 2002 13:50:49 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: <200207220150.g6M1onv22234@oma.cosc.canterbury.ac.nz> "Delaney, Timothy" : > for each person in the queue: > service the person If you actually wrote it that way in Python, it would probably be a bug. It would be better written: while there is someone at the head of the queue: service that person Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Jul 22 00:35:38 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jul 2002 11:35:38 +1200 (NZST) Subject: [Python-Dev] Single- vs. Multi-pass iterability In-Reply-To: Message-ID: <200207212335.g6LNZcU21438@oma.cosc.canterbury.ac.nz> Ka-Ping Yee : > I believe this is where the biggest debate lies: whether "for" should be > non-destructive. It's not the for-loop's fault if it's argument is of such a nature that iterating over it destroys it. Given suitable values for x and y, it's possible for evaluating "x+y" to be a destructive operation. Does that mean we should revise the "+" protocol somehow to prevent this from happening? I don't think so. This sort of thing is all-pervasive in Python due to its dynamic nature. It's not something that can be easily "fixed", even if it were desirable to do so. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From ping@zesty.ca Mon Jul 22 03:55:22 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 21 Jul 2002 19:55:22 -0700 (PDT) Subject: [Python-Dev] The iterator story In-Reply-To: <200207220020.g6M0KCM21823@oma.cosc.canterbury.ac.nz> Message-ID: I'm in a bit of a bind. I know at this point that Guido's already made up his mind so there's nothing further to be gained by debating the issue; yet i feel compelled to respond as long as people keep missing the idea or saying things that don't make sense. So: this is a clarification, not a push. I am going to reply to a few messages at once, to reduce the number of messages that i'm sending on this topic. If you're planning to reply on this thread, please read the whole message before replying. * * * On Mon, 22 Jul 2002, Greg Ewing wrote: > This shows up a problem with Ping's proposal, I think: > The place where you write the for-loop isn't the place > where you know whether something will be iterated over > more than once or not. When you write the for-loop, you decide whether you want to consume the sequence. You use the convention and expect the implementor of the sequence object to adhere to it. > How is a library routine going > to know whether a sequence passed to it is going to > be used again later? You've got this backwards. You write the library routine the way that makes sense, and then you document whether the sequence gets destroyed or not. That declaration becomes part of your interface, and users of your routine can then determine how to use it safely for their needs. (Analogy: how does the implementor of file.close() know whether the caller wants to use the file again later? Answer: it's not the implementor's job to know that. We document what file.close() does, and people only *decide* to call file.close() when they don't need the file anymore.) Without a convention to distinguish between destruction and non-destruction, you can't establish what the library routine does; so you can't document it; so you can't use it safely *even* if you trust the implementor. No implementation would ever make it possible for your library routine to claim that it "does with the elements of a given sequence without destroying the sequence". Now if you do have a convention -- yes, you still have to trust implementors to follow the convention -- but if they do so, you're okay. * * * > This appears to leave the library writer with two > choices: (1) Use for-in, to be on the safe side, > in case the user doesn't want the sequence destroyed -- > but then it can't be used on a destructive iterator, No, it can. The documentation for the library routine will state that it wants a sequence. If the caller wants to use x and x is an iterator, it passes in seq(x). No problem. The caller has thereby declared that it's okay to destroy x. To make it more obvious what is going on, i should have chosen a better name; 'seq' was poor. Let's rename 'seq' to 'consume'. consume(i) returns an object x such that iter(x) is i. So calling 'consume' implies that you are consuming an iterator. All right. Then consider: for x in consume(y): print x The above is clear that y is being destroyed. Now consider: def printout(sequence): for x in sequence: print x If y is an iterator, in my world you would not be able to call "printout(y)". You would say "printout(consume(y))", thus making it clear that y is being destroyed. > (2) use for-from, and force everyone who calls it to > adapt sequences to iterators before calling. Since for-in is non-destructive, it is safer, and it is also more common to have a sequence than an iterator. So i would usually choose option 1 rather than 2. But sure, you can write for-from, if you want. I mean, if you decide to accept strings, then users who want to pass in integers will have to str() them first. If you decide to accept integers, then users who want to pass in strings will have to int() them first. This is no great dilemma. We actually like this. * * * Hereafter i'll stick to existing syntax, because the business of introducing syntax isn't really the main point. I'll use the alternative i proposed, which is to use the built-in instead. So we'd say for i in consume(it): ... instead of for i from it: ... Tim Delaney wrote: > I think this is the crux of the matter. You see for: loops as inherently > non-destructive - that they operate on containers. I (and presumably > Guido, though I would never presume to channel him ;) see for: loops as > inherently destructive - that they operate on iterators. That they obtain > an iterator from a container (if possible) is a useful convenience. I believe your interpretation of opinions is correct on all counts. Except i would point out that for-loops are not always destructive; most of the time, they are not, and that is why i consider the destructive behaviour surprising and worth making visible. > Perhaps the terminology is confusing. Consider a queue. > > for each person in the queue: > service the person > > Is there anyone who would *not* consider this to be destructive (of the > queue)? Well, the only reason you can tell is that you can see the context from the meanings of the words "queue" and "service". If you said for person in consume(queue): service(person) then that would truly be clear, even if you used different variable names, because the 'consume' built-in expresses that the queue will be consumed. * * * Greg Ewing wrote: > Given suitable values for x and y, it's possible for evaluating "x+y" > to be a destructive operation. Does that mean we should revise the > "+" protocol somehow to prevent this from happening? I don't think so. Augh! I'm just not getting through here. We all know that the Python philosophy is to trust the implementors of protocols instead of enforcing behaviour. That's not the point. Of course it's POSSIBLE for "x + y" to be destructive. That doesn't mean it SHOULD be. We all know that "x + y" is normally not destructive, and that's what counts. That understanding enables me to implement __add__ in a way that will not screw you over when you use it. All i'm saying is that there should be a way to *express* safe iteration (and safe "element in container" tests). Guido's pronouncement is "Nope. Don't need it." Although i disagree, i am willing to respect that. But please don't confuse a lack of enforcement with a lack of convention. Convention is all we have. -- ?!ng From tim.one@comcast.net Mon Jul 22 04:09:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 21 Jul 2002 23:09:11 -0400 Subject: [Python-Dev] Priority queue (binary heap) python code In-Reply-To: Message-ID: [Guido] > The only change I would make would be to make heap[0] the > lowest value rather than the highest. [Kevin O'Connor] > I agree this appears more natural, but a priority queue that pops the > lowest priority item is a bit odd. So now the fellow who wrote the code to begin with squirms at what will happen if it's actually put in the std library, and sounds like he would continue using his own code. [Delaney, Timothy] > I'm in two minds about this. My first thought is that the *first* item > (heap[0]) should be the highest priority. > > OTOH, if it were a sorted list, list[0] would return the *lowest* > priority. On the third hand, if you're using heaps for sorting (as in a heapsort), it's far more natural to have a max-heap -- else the sort can't be done in-place (with a max-heap you pop the largest value, copy it to the last array slot, pretend the array is one shorter, and trickle what *was* in the last array slot back into the now-one-smaller max-heap; repeat N-1 times and you've sorted the array in-place). On the fourth hand, if you want a *bounded* priority queue, to remember only the N best-scoring (largest-priority) objects for some fixed N, then (perhaps paradoxically) a min-heap is what you need. On the fifth head, if you want to process items in priorty order (highest first) interleaved with entering new items, then you need a max-heap. I suspect that's what Kevin does. > So i think for consistency heap[0] must return the lowest priority. On the sixth hand, anyone who has implemented a heap in another 0-based language expects the first slot in the array to be unused, in order to simplify the indexing (parent = child >> 1 uniformly if the root is at index 1), and to ensure that all nodes on the same level have indices with the same leading bit (which can be helpful in advanced algorithms -- then, e.g., you know that i and j are on the same level of the tree if and only if i&j > i^j; maybe that's not obvious at first glance ). Priority queues just aren't a once-size-fits-all thing. From drifty@bigfoot.com Mon Jul 22 04:23:03 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Sun, 21 Jul 2002 20:23:03 -0700 (PDT) Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? In-Reply-To: Message-ID: [Tim Peters] > [Tim] > > i-demand-that-everyone-appreciate-jack-more-too-ly y'rs - tim > > [Aahz] > > My iBook and OSCON class members thank Jack. > > Great! You're the most appreciate guy we've got here, Aahz. I demand that > everyone appreciate you more too! > I appreciate everyone everywhere for everything. =) my-Berkeley-education-has-turned-me-hippie-ly y'rs -Brett From tim.one@comcast.net Mon Jul 22 05:01:48 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 00:01:48 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: Just FYI. I ripped out the complications I added to the mergesort variant that tried to speed many-equal-keys cases, and worked on its core competency (intelligent merging) instead. There's a reason : this kick started while investigating ways to speed Zope B-Tree operations when they're used as sets, equal keys are impossible in that context, but intelligent merging can really help. So whatever the fate of this sort, some of the code will live on in Zope's B-Tree routines. The result is that non-trivial cases of near-order got a nice boost, while ~sort got even slower again. I added a new test +sort, which replaces the last 10 values of a sorted array with random values. samplesort has a special case for this, limited to a maximum of 15 trailing out-of-order entries. timsort has no special case for this but does it significantly faster than the samplesort hack anyway, has no limit on how many such trailing entries it can exploit, and couldn't care less whether such entries are at the front or the end of the array; I expect it would be (just) a little slower if they were in the middle. As shown below, timsort does a +sort almost as fast as for a wholly-sorted array. Ditto now for 3sort too, which perturbs order by doing 3 random exchanges in a sorted array. It's become a very interesting sort implementation, handling more kinds of near-order at demonstrably supernatural speed than anything else I'm aware of. ~sort isn't an example of near-order. Quite the contrary, it has a number of inversions quadratic in N, and N/4 runs; the only reason ~sort goes faster than *sort now is-- believe it or not --a surprising benefit from a memory optimization. Key: *sort: random data \sort: descending data /sort: ascending data 3sort: ascending, then 3 random exchanges +sort: ascending, then 10 random at the end ~sort: many duplicates =sort: all equal !sort: worst case scenario C:\Code\python\PCbuild>python -O sortperf.py 15 20 1 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.18 0.02 0.01 0.14 0.01 0.07 0.01 0.17 16 65536 0.24 0.02 0.02 0.22 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.13 19 524288 2.53 0.18 0.17 2.30 0.24 0.74 0.17 2.47 20 1048576 5.47 0.37 0.35 5.17 0.45 1.51 0.35 5.34 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.01 0.01 0.01 0.14 0.01 0.02 16 65536 0.23 0.02 0.02 0.03 0.02 0.21 0.03 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.46 0.04 0.09 18 262144 1.16 0.09 0.09 0.12 0.09 1.01 0.08 0.18 19 524288 2.53 0.18 0.17 0.18 0.18 2.20 0.17 0.36 20 1048576 5.48 0.36 0.35 0.36 0.37 4.78 0.35 0.73 From greg@cosc.canterbury.ac.nz Mon Jul 22 05:50:02 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 22 Jul 2002 16:50:02 +1200 (NZST) Subject: [Python-Dev] The iterator story In-Reply-To: Message-ID: <200207220450.g6M4o2u23472@oma.cosc.canterbury.ac.nz> Ka-Ping Yee : > When you write the for-loop, you decide whether you want > to consume the sequence. As someone pointed out, it's pretty rare that you actually *want* to consume the sequence. Usually the choice is between "I don't care" and "The sequence must NOT be consumed". Of the two varieties of for-loop in your proposal, for-in obviously corresponds to the "must not be consumed" case, leading one to suppose that you intend for-from to be used in the don't-care case. But now you seem to be suggesting that library routines should always use for-in, and that the caller should convert an iterator to a sequence if he knows it's okay to consume it: > Since for-in is non-destructive, it is safer, and it is also > more common to have a sequence than an iterator. > ... > If y is an iterator, in my world you would not be able to > call "printout(y)". You would say "printout(consume(y)) Okay, that seems reasonable -- explicit is better than implicit. But... consider the following two library routines: def printout1(s): for x in s: print x def printout2(s): for x in s: for y in s: print x, y Clearly it's okay to call printout1(consume(s)), but it's NOT okay to call printout2(consume(s)). So we need to document these requirements: def printout1(s): "s may be an iterator or sequence" for x in s: print x def printout2(s): "s MUST be a sequence, NOT an iterator!" for x in s: for y in s: print x, y But now there's nothing to enforce these requirements -- no exception will be raised if you call printout2(consume(s)) by mistake. To get any safety benefit from your proposed arrangement, it seems to me that you'd need to write printout1 as def printout1(s): "s must be an iterator" for x from s: print x and then in the (overwhelmingly most common) case of passing it a sequence, you would need to call it as printout1(iter(s)) -- unless you allow the for-from protocol to automatically obtain an iterator from a sequence if possible, the way for-in currently does. > Greg Ewing wrote: > > Given suitable values for x and y, it's possible for evaluating "x+y" > > to be a destructive operation. Does that mean we should revise the > > "+" protocol somehow to prevent this from happening? I don't think so. > > Augh! I'm just not getting through here. Sorry, I wrote that before I saw your full proposal. I understand your point of view much better now, and even sympathise with it to some extent -- something like the for-from syntax actually passed through my mind shortly before I saw it in your post. There's no doubt that it's very elegant theoretically, but in thinking through the implications, I'm not sure it would be all that helpful in practice, and might even turn out to be a nuisance if it requires putting in a lot of iter(x) and/or consume(x) calls. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Mon Jul 22 07:05:14 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 02:05:14 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: One more piece of this puzzle. It's possible that one of {samplesort, timsort} would become unboundedly faster as the cost of comparisons increased over that of Python floats (which all the timings I posted used). Here's a program that would show this if so, using my local Python, where lists have an .msort() method: """ class SlowCmp(object): __slots__ = ['val'] def __init__(self, val): self.val = val def __lt__(self, other): for i in range(SLOW): i*i return self.val < other.val def drive(n): from random import randrange from time import clock as now n10 = n * 10 L = [SlowCmp(randrange(n10)) for i in xrange(n)] L2 = L[:] t1 = now() L.sort() t2 = now() L2.msort() t3 = now() return t2-t1, t3-t2 for SLOW in 1, 2, 4, 8, 16, 32, 64, 128: print "At SLOW value", SLOW for n in range(1000, 10001, 1000): ss, ms = drive(n) print " %6d %6.2f %6.2f %6.2f" % ( n, ss, ms, 100.0*(ss - ms)/ms) """ Here's the tail end of the output, from which I conclude that the number pf comparisons done on random inputs is virtually identical for the two methods; times vary by a fraction of a percent both ways, with no apparent pattern (note that time.clock() has better than microsecond resolution on WIndows, so the times going into the % calculation have more digits than are displayed here): At SLOW value 32 1000 0.22 0.22 -0.05 2000 0.50 0.50 0.10 3000 0.80 0.80 -0.64 4000 1.11 1.10 0.71 5000 1.44 1.45 -0.12 6000 1.77 1.76 0.72 7000 2.10 2.09 0.31 8000 2.43 2.41 0.79 9000 2.78 2.80 -0.58 10000 3.13 3.13 -0.01 At SLOW value 64 1000 0.37 0.38 -1.00 2000 0.83 0.83 0.20 3000 1.33 1.33 -0.15 4000 1.84 1.84 0.05 5000 2.40 2.39 0.38 6000 2.95 2.92 0.97 7000 3.46 3.47 -0.20 8000 4.04 4.01 0.87 9000 4.60 4.63 -0.68 10000 5.19 5.21 -0.33 At SLOW value 128 1000 0.68 0.67 0.37 2000 1.52 1.50 0.99 3000 2.40 2.41 -0.67 4000 3.35 3.32 1.03 5000 4.30 4.32 -0.47 6000 5.32 5.29 0.54 7000 6.27 6.27 0.04 8000 7.29 7.25 0.55 9000 8.37 8.37 -0.03 10000 9.39 9.43 -0.49 From oren-py-d@hishome.net Mon Jul 22 07:08:18 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 22 Jul 2002 09:08:18 +0300 Subject: [Python-Dev] The iterator story In-Reply-To: <20020721144108.GA5608@panix.com>; from aahz@pythoncraft.com on Sun, Jul 21, 2002 at 10:41:08AM -0400 References: <20020719120043.A21503@glacier.arctrix.com> <3D389369.948547E0@metaslash.com> <20020719162226.A22929@glacier.arctrix.com> <20020720132857.GB5862@hishome.net> <20020720140923.GA18716@panix.com> <20020721083340.A13156@hishome.net> <20020721144108.GA5608@panix.com> Message-ID: <20020722090818.A5576@hishome.net> On Sun, Jul 21, 2002 at 10:41:08AM -0400, Aahz wrote: > > The tranformations are fine the way they are. The problem is the > > source - if the source is an exhausted iterator and you ask it for a > > new iterator it will happily return itself and report StopIteration > > on each .next(). This behavior is indistringuishable from a valid > > iterator on an empty container. > > So the problem lies in asking the source for a new iterator, not in > trying to use it. Making the iterator consumer responsible for handling > this seems like the wrong approach to me -- the consumer *shouldn't* be > able to tell the difference. If you're breaking that paradigm, you > don't actually have an iterator consumer, you've got something else that > wants to use the iterator interface, *plus* some additional features. Tuples are very much like lists except that they cannot be modified. A lot of code that was written with lists in mind can actually use tuples. If you pass a tuple to a function that tries to use the "additional feature" of mutability you will get an exception. Pipes are very much likes files except that they cannot be seeked. A lot of code that was written with files in mind can actually use pipes. If you pass a pipe to a function that tries to use the "additional feature" of seekbility you will get an exception. Iterators are very much like iterable containers except that they can only be iterated once. A lot of code that was written with containers in mind can actually use iterators. If you pass an iterator to a function that tries to use the "additional feature" of re-iterability you... will not get an exception. You'll get nonsense results because on the second pass the iterator will fail silently and suddenly pretend to be an empty container. Would you say that any code that expects a seekable file or a mutable sequence is "breaking the paradigm"? Why should code that expects a re-iterable container be different from code that uses any other protocol that has several variations and subsets/supersets? > The way Python normally handles issues like this is through > documentation. (I.e., if your consumer requires an iterable capable of > producing multiple iterators rather than an iterator object, you document > that.) The way Python normally handles issues of code trying to use a protocol that the object does not support is through *exceptions*. When a 5000+ line program produces meaningless results documentation not not very helpful to start looking for the problem. An exception gives you an approximate line number and reason. If __setitem__ on a tuple was ignored instead of producing an exception or seek on a pipe failed silently I don't think that anyone would find "don't do that, then" or "documentation" to be a satisfactory answer. Oren From mwh@python.net Mon Jul 22 11:03:10 2002 From: mwh@python.net (Michael Hudson) Date: 22 Jul 2002 11:03:10 +0100 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: barry@zope.com's message of "Fri, 19 Jul 2002 17:48:53 -0400" References: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> <200207192123.g6JLN7s15263@pcp02138704pcs.reston01.va.comcast.net> <15672.35141.803094.488541@anthem.wooz.org> Message-ID: <2m1y9w3wrl.fsf@starship.python.net> barry@zope.com (Barry A. Warsaw) writes: > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Traditionally, on Unix per-user extensions are done by > GvR> pointing PYTHONPATH to your per-user directory (-ies) in your > GvR> .profile. > > Or adding them to sys.path via your $PYTHONSTARTUP file. That only helps for interactive sessions... > OTOH, it might be nice if the distutils `install' command had some > switches to make installing in some of these common alternative > locations a little easier. That might dovetail nicely if/when we > decide to add a site-updates directory to sys.path. I don't see what's so very difficult about $ python setup.py install --prefix=$HOME but maybe I'm odd. Cheers, M. -- $ head -n 2 src/bash/bash-2.04/unwind_prot.c /* I can't stand it anymore! Please can't we just write the whole Unix system in lisp or something? */ -- spotted by Rich van der Hoff From Jack.Jansen@cwi.nl Mon Jul 22 13:02:15 2002 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Mon, 22 Jul 2002 14:02:15 +0200 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: <2m1y9w3wrl.fsf@starship.python.net> Message-ID: On Monday, July 22, 2002, at 12:03 , Michael Hudson wrote: > I don't see what's so very difficult about > > $ python setup.py install --prefix=$HOME This is what you use if you have built Python yourself, and installed it in your home directory. What I was referring to (as the setup that isn't very well supported right now) is the situation where the system admin has built and installed Python in, say, /usr/local, and you want to install a distutils-based packaged for your own private use. Setting PYTHONPATH to be $HOME/lib/python-extensions or something similar is what people customarily do to get access to their private modules, but there is no standard, and hence also no way for distutils to find the pathname and provide an easy interface to do this. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From mwh@python.net Mon Jul 22 13:56:33 2002 From: mwh@python.net (Michael Hudson) Date: 22 Jul 2002 13:56:33 +0100 Subject: [Python-Dev] Added platform-specific directories to sys.path In-Reply-To: Jack Jansen's message of "Mon, 22 Jul 2002 14:02:15 +0200" References: Message-ID: <2mptxfncou.fsf@starship.python.net> Jack Jansen writes: > On Monday, July 22, 2002, at 12:03 , Michael Hudson wrote: > > I don't see what's so very difficult about > > > > $ python setup.py install --prefix=$HOME > > This is what you use if you have built Python yourself, and installed it > in your home directory. In that case, the --prefix arg is unnecessary. > What I was referring to (as the setup that isn't very well supported > right now) is the situation where the system admin has built and > installed Python in, say, /usr/local, and you want to install a > distutils-based packaged for your own private use. That's when I do the above. > Setting PYTHONPATH to be $HOME/lib/python-extensions or something > similar is what people customarily do to get access to their private > modules, but there is no standard, and hence also no way for distutils > to find the pathname and provide an easy interface to do this. My setup requires setting $PYTHONPATH too, so it's not ideal, but it works. Cheers, M. -- Reading Slashdot can [...] often be worse than useless, especially to young and budding programmers: it can give you exactly the wrong idea about the technical issues it raises. -- http://www.cs.washington.edu/homes/klee/misc/slashdot.html#reasons From sholden@holdenweb.com Mon Jul 22 14:53:04 2002 From: sholden@holdenweb.com (Steve Holden) Date: Mon, 22 Jul 2002 09:53:04 -0400 Subject: [Python-Dev] Is __declspec(dllexport) really needed on Windows? References: Message-ID: <021001c23187$1aa9b230$6300000a@holdenweb.com> ----- Original Message ----- From: "Brett Cannon" To: "Tim Peters" Cc: Sent: Sunday, July 21, 2002 11:23 PM Subject: RE: [Python-Dev] Is __declspec(dllexport) really needed on Windows? > [Tim Peters] > > > [Tim] > > > i-demand-that-everyone-appreciate-jack-more-too-ly y'rs - tim > > > > [Aahz] > > > My iBook and OSCON class members thank Jack. > > > > Great! You're the most appreciate guy we've got here, Aahz. I demand that > > everyone appreciate you more too! > > > > I appreciate everyone everywhere for everything. =) > > my-Berkeley-education-has-turned-me-hippie-ly y'rs -Brett > I appreciate the set of all things that are insufficiently appreciated, and each of its under-appreciated members but-it-won't-necessarily-make-a-difference-ly y'rs - steve ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From barry@zope.com Mon Jul 22 15:14:01 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 22 Jul 2002 10:14:01 -0400 Subject: [Python-Dev] Added platform-specific directories to sys.path References: <57BEAF46-9B5A-11D6-9B6B-003065517236@oratrix.com> <200207192123.g6JLN7s15263@pcp02138704pcs.reston01.va.comcast.net> <15672.35141.803094.488541@anthem.wooz.org> <2m1y9w3wrl.fsf@starship.python.net> Message-ID: <15676.4905.813038.253158@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: >> your GvR> .profile. Or adding them to sys.path via your >> $PYTHONSTARTUP file. MH> That only helps for interactive sessions... Yup, which might or might not be good enough. I'm thinking of the (X)Emacs arrangement that there are system startup files and user startup files that are normally always loaded, unless you use a command line switch to specifically disable them. >> OTOH, it might be nice if the distutils `install' command had >> some switches to make installing in some of these common >> alternative locations a little easier. That might dovetail >> nicely if/when we decide to add a site-updates directory to >> sys.path. MH> I don't see what's so very difficult about MH> $ python setup.py install --prefix=$HOME Actually, to do it correctly (and quietly) this appears to be the most accurate way to tell distutils to install a library in an alternative search path: % PYTHONPATH= python setup.py --quiet install --install-lib \ --install-purelib A bit less than intuitive than say, a standard alternative user-centric installation directory and a --userdir option to the install command. -Barry From barry@zope.com Mon Jul 22 16:09:49 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 22 Jul 2002 11:09:49 -0400 Subject: [Python-Dev] Sorting References: <20020720130000.GA11845@panix.com> Message-ID: <15676.8253.741856.171571@anthem.wooz.org> >>>>> "A" == Aahz writes: A> Any reason the list object can't grow a .stablesort() method? Because when a user looks at the methods of a list object and sees both .sort() and .stablesort() you now need to explain the difference, and perhaps give some hint as to why you'd want to choose one over the other. Maybe the teachers-of-Python in this crowd can give some insight into whether 1) they'd actually do this or just hand wave past the difference, or 2) whether it would be a burden to teaching. I'm specifically thinking of the non-programmer crowd learning Python. I would think that most naive uses of list.sort() would expect a stable sort and wouldn't care much about any performance penalties involved. I'd put my own uses squarely in the "naive" camp. ;) I'd prefer to see - .sort() actually /be/ a stable sort in the default case - list objects not be burdened with additional sorting methods (that way lies a footing-challenged incline) - provide a module with more advanced sorting options, with functions suitable for list.sort()'s cmpfunc, and with derived classes (perhaps in C) of list for better performance. -Barry From xscottg@yahoo.com Mon Jul 22 16:44:15 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 22 Jul 2002 08:44:15 -0700 (PDT) Subject: [Python-Dev] Sorting In-Reply-To: <15676.8253.741856.171571@anthem.wooz.org> Message-ID: <20020722154415.79981.qmail@web40111.mail.yahoo.com> --- "Barry A. Warsaw" wrote: > > Because when a user looks at the methods of a list object and sees > both .sort() and .stablesort() you now need to explain the difference, > and perhaps give some hint as to why you'd want to choose one over the > other. > Or you could have an optional parameter that defaults to whatever the more sane value should be (probably stable), and when the user stumbles across this parameter they stumble across the docs too. I think Tim's codebloat argument is more compelling. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From skip@mojam.com Mon Jul 22 17:06:58 2002 From: skip@mojam.com (Skip Montanaro) Date: Mon, 22 Jul 2002 11:06:58 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200207221606.g6MG6wT20010@12-248-11-90.client.attbi.com> Bug/Patch Summary ----------------- 262 open / 2681 total bugs (+5) 143 open / 1613 total patches (+15) New Bugs -------- OSX IDE behaviour (output to console) (2002-06-24) http://python.org/sf/573174 pydoc(.org) does not find file.flush() (2002-06-26) http://python.org/sf/574057 Chained __slots__ dealloc segfault (2002-06-26) http://python.org/sf/574207 convert_path fails with empty pathname (2002-06-26) http://python.org/sf/574235 Automated daily documentation builds (2002-06-26) http://python.org/sf/574241 Tex Macro Error (2002-06-27) http://python.org/sf/574939 multiple inheritance w/ slots dumps core (2002-06-28) http://python.org/sf/575229 Parts of 2.2.1 core use old gc API (2002-06-30) http://python.org/sf/575715 os.spawnv() fails with underscores (2002-06-30) http://python.org/sf/575770 Negative __len__ provokes SystemError (2002-06-30) http://python.org/sf/575773 Inconsistent behaviour in re grouping (2002-07-01) http://python.org/sf/576079 Sig11 in cPickle (stack overflow) (2002-07-01) http://python.org/sf/576084 Infinite recursion in Pickle (2002-07-02) http://python.org/sf/576419 Windows binary missing SSL (2002-07-02) http://python.org/sf/576711 os.path.walk behavior on symlinks (2002-07-03) http://python.org/sf/576975 inheriting from property and docstrings (2002-07-03) http://python.org/sf/576990 Wrong description for PyErr_Restore (2002-07-03) http://python.org/sf/577000 Print line number of string if at EOF (2002-07-04) http://python.org/sf/577295 ** in doc/current/lib/operator-map.html (2002-07-04) http://python.org/sf/577513 del __builtins__ breaks out of rexec (2002-07-04) http://python.org/sf/577530 System Error with slots and multi-inh (2002-07-05) http://python.org/sf/577777 resire readonly memory mapped file (2002-07-05) http://python.org/sf/577782 Docs unclear about cleanup. (2002-07-05) http://python.org/sf/577793 Explain how to subclass Exception (2002-07-06) http://python.org/sf/578180 pthread_exit missing in thread_pthread.h (2002-07-09) http://python.org/sf/579116 LibRef 2.2.1, replace zero with False (2002-07-11) http://python.org/sf/579991 Subclassing WeakValueDictionary impossib (2002-07-11) http://python.org/sf/580107 GC Changes not mentioned in What's New (2002-07-12) http://python.org/sf/580462 mimetools module privacy leak (2002-07-12) http://python.org/sf/580495 MacOSX python.app build problems (2002-07-12) http://python.org/sf/580550 import lock should be exposed (2002-07-13) http://python.org/sf/580952 Provoking infinite scanner loops (2002-07-13) http://python.org/sf/581080 smtplib.SMTP.ehlo method esmtp_features (2002-07-13) http://python.org/sf/581165 bug in splituser(host) in urllib (2002-07-14) http://python.org/sf/581529 pty.spawn - wrong error caught (2002-07-15) http://python.org/sf/581698 ''.split() docstring clarification (2002-07-15) http://python.org/sf/582071 pickle error message unhelpful (2002-07-16) http://python.org/sf/582297 lib-dynload/*.so wrong permissions (2002-07-17) http://python.org/sf/583206 ConfigParser spaces in keys not read (2002-07-18) http://python.org/sf/583248 wrong dest size (2002-07-18) http://python.org/sf/583477 gethostbyaddr lag (2002-07-19) http://python.org/sf/583975 add way to detect bsddb version (2002-07-21) http://python.org/sf/584409 os.getlogin() fails (2002-07-21) http://python.org/sf/584566 no doc for os.fsync and os.fdatasync (2002-07-21) http://python.org/sf/584695 New Patches ----------- Deprecate bsddb (2002-05-06) http://python.org/sf/553108 Executable .pyc-files with hashbang (2002-06-23) http://python.org/sf/572796 (?(id/name)yes|no) re implementation (2002-06-23) http://python.org/sf/572936 cgi.py and rfc822.py unquote fixes (2002-06-24) http://python.org/sf/573197 Changing owner of symlinks (2002-06-25) http://python.org/sf/573770 makesockaddr, use addrlen with AF_UNIX (2002-06-27) http://python.org/sf/574707 Make python-mode.el use jython (2002-06-27) http://python.org/sf/574747 Make python-mode.el use "jython" interp (2002-06-27) http://python.org/sf/574750 list.extend docstring fix (2002-06-27) http://python.org/sf/574867 PyTRASHCAN slots deallocation (2002-06-28) http://python.org/sf/575073 python-mode patch for ipython support (2002-06-30) http://python.org/sf/575774 SSL release GIL (2002-06-30) http://python.org/sf/575827 Alternative implementation of interning (2002-07-01) http://python.org/sf/576101 Extend PyErr_SetFromWindowsErr (2002-07-02) http://python.org/sf/576458 Remove PyArg_Parse() and METH_OLDARGS (2002-07-03) http://python.org/sf/577031 Merge xrange() into slice() (2002-07-05) http://python.org/sf/577875 fix for problems with test_longexp (2002-07-06) http://python.org/sf/578297 Put IDE scripts in ~/Library (2002-07-08) http://python.org/sf/578667 incompatible, but nice strings improveme (2002-07-08) http://python.org/sf/578688 Solaris openpty() and forkpty() addition (2002-07-09) http://python.org/sf/579433 Shadow Password Support Module (2002-07-09) http://python.org/sf/579435 Build MachoPython with 2level namespace (2002-07-10) http://python.org/sf/579841 xreadlines caching, file iterator (2002-07-11) http://python.org/sf/580331 less restrictive HTML comments (2002-07-12) http://python.org/sf/580670 Fix for seg fault on test_re on mac osx (2002-07-12) http://python.org/sf/580869 new version of Set class (2002-07-13) http://python.org/sf/580995 Canvas "select_item" always returns None (2002-07-14) http://python.org/sf/581396 info reader bug (2002-07-14) http://python.org/sf/581414 fix to pty.spawn error on Linux (2002-07-15) http://python.org/sf/581705 Alternative PyTRASHCAN subtype_dealloc (2002-07-15) http://python.org/sf/581742 smtplib.py patch for macmail esmtp auth (2002-07-17) http://python.org/sf/583180 make file object an iterator (2002-07-17) http://python.org/sf/583235 get python to link on OSF1 (Dec Unix) (2002-07-20) http://python.org/sf/584245 yield allowed in try/finally (2002-07-21) http://python.org/sf/584626 Closed Bugs ----------- ihooks on windows and pythoncom (PR#294) (2000-07-31) http://python.org/sf/210637 httplib does not check if port is valid (easy to fix?) (2000-12-13) http://python.org/sf/225744 httplib problem with '100 Continue' (2001-01-02) http://python.org/sf/227361 [windows] os.popen doens't kill subprocess when interrupted (2001-02-06) http://python.org/sf/231273 += not assigning to same var it reads (2001-04-21) http://python.org/sf/417930 httplib: multiple Set-Cookie headers (2001-06-12) http://python.org/sf/432621 [win32] KeyboardInterrupt Not Caught (2001-07-10) http://python.org/sf/439992 Evaluating func_code causing core dump (2001-07-23) http://python.org/sf/443866 HTTPSConnect.__init__ too tricky (2001-09-04) http://python.org/sf/458463 base n integer to string conversion (2001-09-25) http://python.org/sf/465045 Tut: Dict used before dicts explained (2001-11-10) http://python.org/sf/480337 SAX Attribute/AttributesNS class missing (2001-11-22) http://python.org/sf/484603 Error building info docs (2001-12-20) http://python.org/sf/495624 'lambda' documentation in strange place (2001-12-27) http://python.org/sf/497109 unicode() docs don't mention LookupError (2002-02-06) http://python.org/sf/513666 bogus URLs cause exception in httplib (2002-03-07) http://python.org/sf/527064 Nested Scopes bug (Confirmed) (2002-03-10) http://python.org/sf/528274 Build unable to import w/gcc 3.0.4 (2002-04-11) http://python.org/sf/542737 buffer slice type inconsistant (2002-04-20) http://python.org/sf/546434 urllib/httplib vs corrupted tcp/ip stack (2002-04-22) http://python.org/sf/547093 Unicode encoders appears to leak references (2002-04-28) http://python.org/sf/549731 email.Utils.encode doesn't obey rfc2047 (2002-05-06) http://python.org/sf/552957 unittest.TestResult documentation (2002-05-20) http://python.org/sf/558278 HTTPSConnection memory leakage (2002-05-22) http://python.org/sf/559117 Getting traceback in embedded python. (2002-06-01) http://python.org/sf/563338 urllib2 can't cope with error response (2002-06-02) http://python.org/sf/563665 compile traceback must include filename (2002-06-05) http://python.org/sf/564931 Misleading string constant. (2002-06-12) http://python.org/sf/568269 minor improvement to Grammar file (2002-06-13) http://python.org/sf/568412 Broken pre.subn() (and pre.sub()) (2002-06-17) http://python.org/sf/570057 glob() fails for network drive in cgi (2002-06-19) http://python.org/sf/571167 imaplib fetch is broken (2002-06-19) http://python.org/sf/571334 Numeric Literal Anomoly (2002-06-19) http://python.org/sf/571382 Segmentation fault in Python 2.3 (2002-06-20) http://python.org/sf/571885 python-mode IM parses code in docstrings (2002-06-21) http://python.org/sf/572341 Memory leak in object comparison (2002-06-22) http://python.org/sf/572567 Closed Patches -------------- Optional memory profiler (2000-08-18) http://python.org/sf/401229 Pure Python strptime() (PEP 42) (2001-10-23) http://python.org/sf/474274 Unicode support in email.Utils.encode (2001-12-07) http://python.org/sf/490456 httplib.py screws up on 100 response (2001-12-31) http://python.org/sf/498149 make python-mode play nice with gdb (2002-01-28) http://python.org/sf/509975 imputil.py can't import "\r\n" .py files (2002-02-28) http://python.org/sf/523944 urllib2.py: fix behavior with proxies (2002-03-08) http://python.org/sf/527518 Better AttributeError formatting (2002-03-20) http://python.org/sf/532638 RFC 2231 support for email package (2002-04-26) http://python.org/sf/549133 Fix for httplib bug with 100 Continue (2002-05-01) http://python.org/sf/551273 Py_AddPendingCall doesn't unlock on fail (2002-05-03) http://python.org/sf/552161 os.uname() on Darwin space in machine (2002-05-24) http://python.org/sf/560311 Remove UserDict from cookie.py (2002-05-31) http://python.org/sf/562987 email Parser non-strict mode (2002-06-06) http://python.org/sf/565183 Expose _Py_ReleaseInternedStrings (2002-06-06) http://python.org/sf/565378 Rationalize DL_IMPORT and DL_EXPORT (2002-06-07) http://python.org/sf/566100 Convert slice and buffer to types (2002-06-13) http://python.org/sf/568544 Remove support for Win16 (2002-06-16) http://python.org/sf/569753 Changes (?P=) with optional backref (2002-06-20) http://python.org/sf/571976 From barry@zope.com Mon Jul 22 17:05:58 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 22 Jul 2002 12:05:58 -0400 Subject: [Python-Dev] Sorting References: <15676.8253.741856.171571@anthem.wooz.org> <20020722154415.79981.qmail@web40111.mail.yahoo.com> Message-ID: <15676.11622.589953.460393@anthem.wooz.org> >>>>> "SG" == Scott Gilbert writes: SG> Or you could have an optional parameter that defaults to SG> whatever the more sane value should be (probably stable), and SG> when the user stumbles across this parameter they stumble SG> across the docs too. SG> I think Tim's codebloat argument is more compelling. Except that in http://mail.python.org/pipermail/python-dev/2002-July/026837.html Tim says: "Back on Earth, among Python users the most frequent complaint I've heard is that list.sort() isn't stable." and here http://mail.python.org/pipermail/python-dev/2002-July/026854.html Tim seems to be arguing against stable sort as being the default due to code bloat. As Tim's Official Sysadmin, I'm only good at channeling him on one subject, albeit probably one he'd deem most important to his life: lunch. So I'm not sure if he's arguing for or against stable sort being the default. ;) -Barry From skip@pobox.com Mon Jul 22 17:19:40 2002 From: skip@pobox.com (Skip Montanaro) Date: Mon, 22 Jul 2002 11:19:40 -0500 Subject: [Python-Dev] Weekly bug report summary Message-ID: <15676.12444.402113.866101@12-248-11-90.client.attbi.com> Neal Norwitz asked me what happened to the weekly bug summary mailing. I've been off-net a lot and was running it via cron on my laptop Sunday mornings. I just ran the bug reporter manually and migrated the database and script over to the Mojam web server. With any luck, the script will run properly next Sunday morning. Skip From barry@zope.com Mon Jul 22 18:24:52 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 22 Jul 2002 13:24:52 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 References: Message-ID: <15676.16356.112688.518256@anthem.wooz.org> [Diverting to python-dev... -BAW] timmie> Update of /cvsroot/python/python/dist/src/Lib/email/test timmie> In directory timmie> usw-pr-cvs1:/tmp/cvs-serv3289/python/lib/email/test | Modified Files: | test_email_codecs.py | Log Message: | Changed import from | from test.test_support import TestSkipped, run_unittest | to | from test_support import TestSkipped, run_unittest timmie> Otherwise, if the Japanese codecs aren't installed, timmie> regrtest doesn't believe the TestSkipped exception raised timmie> by this test matches the timmie> except (ImportError, test_support.TestSkipped), msg: timmie> it's looking for, and reports the skip as a crash failure timmie> instead of as a skipped test. timmie> I suppose this will make it harder to run this test timmie> outside of regrtest, but under the assumption only Barry timmie> does that, better to make it skip cleanly for everyone timmie> else. A better fix, IMO, is to recognize that the `test' package has become a full fledged standard lib package (a Good Thing, IMO), heed our own admonitions not to do relative imports, and change the various places in the test suite that "import test_support" (or equiv) to "import test.test_support" (or equiv). I've twiddled the test suite to do things this way, and all the (expected Linux) tests pass, so I'd like to commit these changes. Unit test writers need to remember to use test.test_support instead of just test_support. We could do something wacky like remove '' from sys.path if we really cared about enforcing this. It would also be good for folks on other systems to make sure I haven't missed a module. -Barry From tim.one@comcast.net Mon Jul 22 18:28:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 13:28:11 -0400 Subject: [Python-Dev] Sorting In-Reply-To: <15676.11622.589953.460393@anthem.wooz.org> Message-ID: [Barry Warsaw] > Except that in > > http://mail.python.org/pipermail/python-dev/2002-July/026837.html > > Tim says: > > "Back on Earth, among Python users the most frequent complaint > I've heard is that list.sort() isn't stable." Yes, and because the current samplesort falls back to a stable sort when lists are small, almost everyone who cares about this and tries to guess about stability via trying small examples comes to a wrong conclusion. > and here > > http://mail.python.org/pipermail/python-dev/2002-July/026854.html > > Tim seems to be arguing against stable sort as being the > default due to code bloat. I'm arguing there against having two highly complex and long-winded sorting algorithms in the core. Pick one. In favor of samplesort: + It can be much faster in very-many-equal-elements cases (note that ~sort lists have only 4 distinct values, each repeated N/4 times and spread uniformaly across the whole list). + While it requires some extra memory, that lives on the stack and is O(log N). As a result, it can never raise MemoryError unless a comparison function does. + It's never had a bug reported against it (so is stable in a different sense ). In favor of timsort: + It's stable. + The code is more uniform and so potentially easier to grok, and because it has no random component is easier to predict (e.g., it's certain that it has no quadratic-time cases). + It's incredibly faster in the face of many more kinds of mild disorder, which I believe are very common in the real world. As obvious examples, you add an increment of new data to an already- sorted file, or paste together several sorted files. timsort screams in those cases, but they may as well be random to samplesort, and the difference in runtime can easily exceed a factor of 10. A factor of 10 is a rare and wonderful thing in algorithm development. Against timsort: + It can require O(N) temp storage, although the constant is small compared to object sizes. That means it can raise MemoryError even if a comparison function never does. + Very-many-equal-elements cases can be much slower, but that's partly because it *is* stable, and preserving the order of equal elements is exactly what makes stability hard to achieve in a fast sort (samplesort can't be made stable efficiently). > As Tim's Official Sysadmin, I'm only good at channeling him on one > subject, albeit probably one he'd deem most important to his life: > lunch. So I'm not sure if he's arguing for or against stable sort > being the default. ;) All else being equal, a stable sort is a better choice. Alas, all else isn't equal. If Python had no sort method now, I'd pick timsort with scant hesitation. Speaking of which, is it time for lunch yet ? From nas@python.ca Mon Jul 22 19:18:47 2002 From: nas@python.ca (Neil Schemenauer) Date: Mon, 22 Jul 2002 11:18:47 -0700 Subject: [Python-Dev] Sorting In-Reply-To: ; from tim.one@comcast.net on Mon, Jul 22, 2002 at 01:28:11PM -0400 References: <15676.11622.589953.460393@anthem.wooz.org> Message-ID: <20020722111847.A3095@glacier.arctrix.com> Tim Peters wrote: > Pick one. I pick timsort. Stability is nice to have. It sounds like if you want a stable sort you will have to pay for it (e.g. ~sort is slower). The fact that timsort is faster on partially sorted inputs more than makes up for it. Neil From tim.one@comcast.net Mon Jul 22 19:20:25 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 14:20:25 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 In-Reply-To: <15676.16356.112688.518256@anthem.wooz.org> Message-ID: [Barry] > A better fix, IMO, is to recognize that the `test' package has become > a full fledged standard lib package (a Good Thing, IMO), heed our own > admonitions not to do relative imports, and change the various places > in the test suite that "import test_support" (or equiv) to "import > test.test_support" (or equiv). > > I've twiddled the test suite to do things this way, and all the > (expected Linux) tests pass, so I'd like to commit these changes. > Unit test writers need to remember to use test.test_support instead of > just test_support. We could do something wacky like remove '' from > sys.path if we really cared about enforcing this. It would also be > good for folks on other systems to make sure I haven't missed a > module. Note test/README, which says in part: """ NOTE: Always import something from test_support like so: from test_support import verbose or like so: import test_support ... use test_support.verbose in the code ... Never import anything from test_support like this: from test.test_support import verbose "test" is a package already, so can refer to modules it contains without "test." qualification. If you do an explicit "test.xxx" qualification, that can fool Python into believing test.xxx is a module distinct from the xxx in the current package, and you can end up importing two distinct copies of xxx. This is especially bad if xxx=test_support, as regrtest.py can (and routinely does) overwrite its "verbose" and "use_large_resources" attributes: if you get a second copy of test_support loaded, it may not have the same values for those as regrtest intended. """ I don't have a deep understanding of these miserable issues, so settled for a one-line patch that worked. The admonition to never import from test.test_support was a BDFL Pronouncement at the time. Note that Jack runs tests in ways nobody else does, via importing something or other from an interactive Python session (Mac Classic doesn't have a cmdline shell -- something like that). It's always an adventure trying to guess how things will break for him, although I'm not sure your suggestion is (or isn't) relevant to Jack. I imagine things will work provided that all imports "are the same". I'm not sure fiddling all the code is worth it just to save a line of typing in the email package's test suite. From tim.one@comcast.net Mon Jul 22 20:32:09 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 15:32:09 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: If you have access to a good library, you'll enjoy reading the original paper on samplesort; or a scan can be purchased from the ACM: Samplesort: A Sampling Approach to Minimal Storage Tree Sorting W. D. Frazer, A. C. McKellar JACM, Vol. 17, No. 3, July 1970 As in many papers of its time, the algorithm description is English prose and raises more questions than it answers, but the mathematical analysis is extensive. Two things made me laugh out loud: 1. The largest array they tested had 50,000 elements, because that was the practical upper limit given storage sizes at the time. Now that's such a tiny case that even in Python it's hard to time it accurately. 2. They thought about using a different sort method for small buckets, However, the additional storage required for the program would reduce the size of the input sequence which could be accommodated, and hence it is an open question as to whether or not the efficiency of the total sorting process could be improved in this way. In some ways, life was simpler then . for-example-i-had-more-hair-ly y'rs - tim From barry@zope.com Mon Jul 22 20:38:16 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 22 Jul 2002 15:38:16 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 References: <15676.16356.112688.518256@anthem.wooz.org> Message-ID: <15676.24360.88972.449273@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Note test/README, which says in part: TP> """ TP> NOTE: Always import something from test_support like so: TP> from test_support import verbose TP> or like so: | import test_support | ... use test_support.verbose in the code ... TP> Never import anything from test_support like this: TP> from test.test_support import verbose TP> "test" is a package already, so can refer to modules it TP> contains without "test." qualification. If you do an explicit TP> "test.xxx" qualification, that can fool Python into believing TP> test.xxx is a module distinct from the xxx in the current TP> package, and you can end up importing two distinct copies of TP> xxx. This is especially bad if xxx=test_support, as TP> regrtest.py can (and routinely does) overwrite its "verbose" TP> and "use_large_resources" attributes: if you get a second copy TP> of test_support loaded, it may not have the same values for TP> those as regrtest intended. """ Yep, but I think those recommendations are out-of-date. You added them to the file almost 2 years ago. ;) Note that the warnings in that README go away when regrtest also imports test_support from the test package. TP> I don't have a deep understanding of these miserable issues, TP> so settled for a one-line patch that worked. The admonition TP> to never import from test.test_support was a BDFL TP> Pronouncement at the time. Hmm, I don't know if he considers that admonition to still be in effect, but I'd like to hope not. We're discouraging relative imports these days, and I don't see any deep reason why the regression tests need to break this rule to function (and indeed, on Unix at least it doesn't seem to). TP> Note that Jack runs tests in ways nobody else does, via TP> importing something or other from an interactive Python TP> session (Mac Classic doesn't have a cmdline shell -- something TP> like that). It's always an adventure trying to guess how TP> things will break for him, although I'm not sure your TP> suggestion is (or isn't) relevant to Jack. I wouldn't presume to know! So I'll generate a patch, upload it to SF, and assign it to Jack for review. TP> I imagine things will work provided that all imports "are the TP> same". Yes. TP> I'm not sure fiddling all the code is worth it just to TP> save a line of typing in the email package's test suite. It's a bit uglier than that because since Lib/test gets magically added to sys.path during regrtest by virtue of running "python Lib/test/regrtest.py". So to find the "same" test_support module, you'd probably have to do something more along the lines of >>> import os >>> import test.regrtest >>> testdir = os.path.dirname(test.regrtest.__file__) >>> sys.path.insert(0, testdir) >>> import test_support blechi-ly y'rs, -Barry From Rick Farrer" This is a multi-part message in MIME format. ------=_NextPart_000_000D_01C23192.61006FC0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Please remove me from your mailing list. Thanks ------=_NextPart_000_000D_01C23192.61006FC0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Please remove me from your mailing=20 list.
Thanks
------=_NextPart_000_000D_01C23192.61006FC0-- From tim.one@comcast.net Tue Jul 23 03:07:57 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 22 Jul 2002 22:07:57 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: This is a multi-part message in MIME format. --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw) Content-type: text/plain; charset=Windows-1252 Content-transfer-encoding: 7BIT In an effort to save time on email (ya, right ...), I wrote up a pretty detailed overview of the "timsort" algorithm. It's attached. all-will-be-revealed-ly y'rs - tim --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw) Content-type: text/plain; name=timsort.txt Content-transfer-encoding: quoted-printable Content-disposition: attachment; filename=timsort.txt /*-----------------------------------------------------------------------= ---- A stable natural mergesort with excellent performance on many flavors of lightly disordered arrays, and as fast as samplesort on random arrays. In a nutshell, the main routine marches over the array once, left to = right, alternately identifying the next run, and then merging it into the = previous runs. Everything else is complication for speed, and some measure of = memory efficiency. Runs ---- count_run() returns the # of elements in the next run. A run is either "ascending", which means non-decreasing: a0 <=3D a1 <=3D a2 <=3D ... or "descending", which means strictly decreasing: a0 > a1 > a2 > ... Note that a run is always at least 2 long, unless we start at the = array's last element. The definition of descending is strict, because the main routine = reverses a descending run in-place, transforming a descending run into an = ascending run. Reversal is done via the obvious fast "swap elements starting at = each end, and converge at the middle" method, and that can violate stability = if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements. If an array is random, it's very unlikely we'll see long runs, much of = the rest of the algorithm is geared toward exploiting long runs, and that = takes a fair bit of work. That work is a waste of time if the data is random, = so if a natural run contains less than MIN_MERGE_SLICE elements, the main = loop artificially boosts it to MIN_MERGE_SLICE elements, via binary insertion sort applied to the right number of array elements following the short natural run. In a random array, *all* runs are likely to be = MIN_MERGE_SLICE long as a result, and merge_at() short-circuits the expensive stuff in = that case. The Merge Pattern ----------------- In order to exploit regularities in the data, we're merging on natural run lengths, and they can become wildly unbalanced. But that's a Good = Thing for this sort! Stability constrains permissible merging patterns. For example, if we = have 3 consecutive runs of lengths A:10000 B:20000 C:10000 we dare not merge A with C first, because if A, B and C happen to = contain a common element, it would get out of order wrt its occurence(s) in B. = The merging must be done as (A+B)+C or A+(B+C) instead. So merging is always done on two consecutive runs at a time, and = in-place, although this may require some temp memory (more on that later). When a run is identified, its base address and length are pushed on a = stack in the MergeState struct. merge_collapse() is then called to see = whether it should merge it with preceeding run(s). We would like to delay = merging as long as possible in order to exploit patterns that may come up later, = but we would like to do merging as soon as possible to exploit that the run = just found is still high in the memory hierarchy. We also can't delay = merging "too long" because it consumes memory to remember the runs that are = still unmerged, and the stack has a fixed size. What turned out to be a good compromise maintains two invariants on the stack entries, where A, B and C are the lengths of the three righmost = not-yet merged slices: 1. A > B+C 2. B > C Note that, by induction, #2 implies the lengths of pending runs form a decreasing sequence. #1 implies that, reading the lengths right to = left, the pending-run lengths grow at least as fast as the Fibonacci numbers. Therefore the stack can never grow larger than about log_base_phi(N) = entries, where phi =3D (1+sqrt(5))/2 ~=3D 1.618. Thus a small # of stack slots = suffice for very large arrays. If A <=3D B+C, the smaller of A and C is merged with B, and the new run = replaces the A,B or B,C entries; e.g., if the last 3 entries are A:30 B:20 C:10 then B is merged with C, leaving A:30 BC:30 on the stack. Or if they were A:500 B:400: C:1000 then A is merged with B, leaving AB:900 C:1000 on the stack. In both examples, the stack configuration still violates invariant #2, = and merge_at() goes on to continue merging runs until both invariants are satisfied. As an extreme case, suppose we didn't do the MIN_MERGE_SLICE gimmick, and natural runs were of lengths 128, 64, 32, 16, 8, 4, 2, and = 2. Nothing would get merged until the final 2 was seen, and that would = trigger 7 perfectly balanced (both runs involved have the same size) merges. The thrust of these rules when they trigger merging is to balance the = run lengths as closely as possible, while keeping a low bound on the number of runs we have to remember. This is maximally effective for random = data, where all runs are likely to be of (artificially forced) length MIN_MERGE_SLICE, and then we get a sequence of perfectly balanced = merges. OTOH, the reason this sort is so good for lightly disordered data has to = do with wildly unbalanced run lengths. Merge Memory ------------ Merging adjacent runs of lengths A and B in-place is very difficult. Theoretical constructions are known that can do it, but they're too = difficult and slow for practical use. But if we have temp memory equal to min(A, = B), it's easy. If A is smaller, copy A to a temp array, leave B alone, and then we can do the obvious merge algorithm left to right, from the temp area and B, starting the stores into where A used to live. There's always a free = area in the original area comprising a number of elements equal to the number not yet merged from the temp array (trivially true at the start; proceed by induction). The only tricky bit is that if a comparison raises an exception, we have to remember to copy the remaining elements back in = from the temp area, lest the array end up with duplicate entries from B. If B is smaller, much the same, except that we need to merge right to = left, starting the stores at the right end of where B used to live. In all, then, we need no more than N/2 temp array slots. A refinement: When we're about to merge adjacent runs A and B, we first do a form of binary search (more on that later) to see where B[0] should end up in A. Elements in A preceding that point are already in their = final positions, effectively shrinking the size of A. Likewise we also search to see where A[-1] should end up in B, and elements of B after that = point can also be ignored. This cuts the amount of temp memory needed by the same amount. It may not pay, though. Merge Algorithms ---------------- When merging runs of lengths A and B, if A/2 <=3D B <=3D 2*A (i.e., = they're within a factor of two of each other), we do the usual straightforward = one-at- a-time merge. This can take up to A+B comparisons. If the data is = random, there's very little potential for doing better than that. If there are = a great many equal elements, we can do better than that, but there's no = way to know whether there *are* a great many equal elements short of doing a great many additional comparisons (we only use "<" in sort), and that's too expensive when it doesn't pay. If the sizes of A and B are out of whack, we can do much better. The Hwang-Lin merging algorithm is very good at merging runs of mismatched lengths if the data is random, but I believe it would be a mistake to try that here. As explained before, if we really do have random data, = we're almost certainly going to stay in the A/2 <=3D B <=3D 2*A case. Instead we assume that wildly different run lengths correspond to *some* sort of clumpiness in the data. Without loss of generality, assume A is the shorter run. We first look for A[0] in B. We do this via = "galloping", comparing A[0] in turn to B[0], B[1], B[3], B[7], ..., B[2**j - 1], ..., until finding the k such that B[2**(k-1) - 1] < A[0] <=3D B[2**k - 1]. = This takes at most log2(B) comparisons, and, unlike a straight binary search, favors finding the right spot early in B. Why that's important may = become clear later. After finding such a k, the region of uncertainty is reduced to 2**(k-1) = - 1 consecutive elements, and a straight binary search requires exactly k-1 comparisons to nail it. Now we can copy all the B's up to that point in one chunk, and then copy = A[0]. If the data really is clustered, the new A[0] (what was A[1] at the = start) is likely to belong near the start of what remains of the B run. That's why we gallop first instead of doing a straight binary search: if the = new A[0] really is near the start of the remaining B run, galloping will = find it much quicker. OTOH, if we're wrong, galloping + binary search never = takes more than 2*log2(B) compares, so can't become a disaster. If the = clumpiness comes in distinct clusters, gallop + binary search also adapts nicely to that. I first learned about the galloping strategy in a related context; do a Google search to find this paper available online: "Adaptive Set Intersections, Unions, and Differences" (2000) Erik D. Demaine, Alejandro L=F3pez-Ortiz, J. Ian Munro and its followup(s). -------------------------------------------------------------------------= --*/ --Boundary_(ID_G6Ak++OVPb3/j8dlAUI+lw)-- From neal@metaslash.com Tue Jul 23 04:19:32 2002 From: neal@metaslash.com (Neal Norwitz) Date: Mon, 22 Jul 2002 23:19:32 -0400 Subject: [Python-Dev] More Sorting Message-ID: <3D3CCB44.4F2592ED@metaslash.com> Sebastien Keim posted a patch (http://python.org/sf/544113) of a merge sort. I didn't really review it, but it included test and doc. So if the bisect module is being added to, perhaps someone should review this patch. Neal From ping@zesty.ca Tue Jul 23 05:57:24 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Mon, 22 Jul 2002 21:57:24 -0700 (PDT) Subject: [Python-Dev] Re: The iterator story In-Reply-To: <200207220450.g6M4o2u23472@oma.cosc.canterbury.ac.nz> Message-ID: SYNOPSIS: a slight adjustment to the definition of consume() yields a simple solution that addresses both the destruction issue and the multiple-iteration issue, without introducing any new syntax. On Mon, 22 Jul 2002, Greg Ewing wrote: > As someone pointed out, it's pretty rare that you actually *want* to > consume the sequence. Usually the choice is between "I don't care" and > "The sequence must NOT be consumed". Sure, i'll go for that. What i'm after is the ability to say "i would like this sequence not to be consumed." > Of the two varieties of for-loop in your proposal, for-in > obviously corresponds to the "must not be consumed" case, > leading one to suppose that you intend for-from to be used in > the don't-care case. Right. > But now you seem to be suggesting that library routines > should always use for-in, and that the caller should > convert an iterator to a sequence if he knows it's okay > to consume it: The two are semantically equivalent proposals. I explained them both in the original message that i posted proposing the solution. The 'consume()' library routine is just another way to express 'for-from' without using new syntax. However, it is true that 'consume()' is more generally useful. It would be good to have, whether or not we had new syntax. I acknowledge that i did not realize this at the time i wrote the earlier message, or i would have stated the 'consume()' (then called 'seq()') proposal first and the for-from proposal second, instead of the opposite. That is why i am sticking to talking about the no-new-syntax version of the proposal for now. I apologize if it seems that i am asking you to follow a moving target. I would like you to recognize, though, that the underlying concept is the same -- the programmer has to signal when an iterator is being used like a sequence. > Okay, that seems reasonable -- explicit is better than > implicit. But... consider the following two library > routines: > > def printout1(s): > for x in s: > print x > > def printout2(s): > for x in s: > for y in s: > print x, y [...] > no exception will be raised if you call printout2(consume(s)) > by mistake. Good point! Clearly my proposal did not take care of this case. (But there are solutions below; read on.) Upon some reflection, though, it seems to me that this problem is orthogonal to the proposal: forcing the programmer to declare when destruction is allowed neither solves nor exacerbates the problem of printout2(). consume() is about destruction, whereas printout2() is about multiple iteration. > To get any safety benefit from your proposed arrangement, > it seems to me that you'd need to write printout1 as > > def printout1(s): > "s must be an iterator" > for x from s: > print x I'm afraid i don't see how this bears on the problem you just described. It still would not be possible to write a safe version of printout2() in either (a) the world of the current Python with iterators or (b) a world where for-in does not accept iterators and consume() has been introduced. One real solution to this problem is what Oren has been suggesting all along -- raise an IteratorExhausted exception if you try to fetch an element from an iterator that has already thrown StopIteration. In printout2(), this exception would occur on the second time through the inner loop. This works, but we can do even better. After some thought today, i realized that there is a second solution. Thanks for leading me to it, Greg! With consume(), the programmer has declared that the iterator is okay to destroy. But my definition of consume() was incomplete. One slight change solves the problem: consume(y) returns x such that iter(x) returns y the first time, and raises IteratorConsumedException thereafter. Now we're all set! If consume(it) is passed to printout2(), an exception is raised immediately before any damage is done. This detects whether you attempt to *start* the iterator twice, which makes more sense than detecting whether you hit the *end* of the iterator twice. The insight is that protection against multiple iteration belongs in the implementation of __iter__, not in the iterator itself -- because the iterator doesn't know whether it can be restarted. The *provider* of the iterator does. > There's no doubt that it's very elegant theoretically, > but in thinking through the implications, I'm not sure it > would be all that helpful in practice, and might even > turn out to be a nuisance if it requires putting in a > lot of iter(x) and/or consume(x) calls. It's not so bad. You only have to say iter() or consume() in exceptional cases, where you are specifically writing code to manipulate iterators. Everything else looks the same -- except it's safe. More importantly, neither iter() nor consume() need to be taught on the first day of Python. I think it all comes together quite nicely. Here it is in summary: - Iterators just implement __next__. - Containers, and other things that want to be iterated over, just implement __iter__. - The new built-in routine consume(y) returns x such that iter(x) returns y the first time, and raises IteratorConsumedException thereafter. - (Other objects that only allow one-shot iteration can also raise IteratorConsumedException when their __iter__ is called twice.) Advantages: 1. "for-in" and "in" are safe to use -- no fear of destruction. 2. One-shot iterators are safe against multiple iteration. 3. Iterators don't have to implement a dummy __iter__ method returning self. 4. The implementation of "for" stays exactly as it is now. 5. Current implementations of iterators continue to work fine, if unsafely (but they're already unsafe). 6. No new syntax. 7. For-loops continue to work on containers exactly as they always have. 8. Iterators don't have to maintain extra state to know that it's time to start throwing IteratorExhausted instead of StopIteration. Items 1, 2, and 3 are distinct improvements over the current state of affairs. The only inconvenience is the case where an iterator is being passed to a routine that expects a container; this is still pretty rare yet, and this situation is easy to detect (hence, the error message from "for" can explain what to do). In this case, you have to wrap consume() around the iterator to declare it okay to consume. And that's all. The fact that it takes only a slight adjustment to the earlier proposal to solve *both* the destruction problem and the multiple-iteration problem has led me to be even more convinced that this is the "right answer" -- in the sense that this is how i would design the protocol if we were starting from scratch. Now, i know we are not starting from scratch. And i know Guido has already said he doesn't want to solve this problem. But, just in case you are wondering, the migration path from here to there seems pretty straightforward to me: 1. When __next__() is not present, call next() and issue a warning. 2. In the next version, deprecate next() in favour of __next__(). 3. Add consume() and IteratorConsumedException to built-ins. 4. Deprecate the dummy __iter__() method on iterators. 5. Throw a party and consume(mass_quantities). -- ?!ng "Most things are, in fact, slippery slopes. And if you start backing off from one thing because it's a slippery slope, who knows where you'll stop?" -- Sean M. Burke From xscottg@yahoo.com Tue Jul 23 07:22:12 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 22 Jul 2002 23:22:12 -0700 (PDT) Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: <20020723062212.25747.qmail@web40102.mail.yahoo.com> --- Tim Peters wrote: > In an effort to save time on email (ya, right ...), I wrote up a pretty > detailed overview of the "timsort" algorithm. It's attached. > > all-will-be-revealed-ly y'rs - tim > > [Interesting stuff deleted.] > I'm curious if there is any literature that you've come across, or if you've done any experiments with merging more than two parts at a time. So instead of merging like such: A B C D E F G H I J K L AB CD EF GH IJ KL ABCD EFGH IJKL ABCDEFGH IJKL ABCDEFGHIJKL You were to merge A B C D E F G H I J K L ABC DEF GHI JKL ABCDEF GHIJKL ABCDEFGHIJKL (I realize that your merges are based on the lengths of the subsequences, but you get the point.) My thinking is that many machines (probably yours for instance) have a cache that is 4-way associative, so merging only 2 blocks at a time might not be using the cache as well as it could. Also, changing from merging 2 blocks to 3 or 4 blocks at a time would change the number of passes you have to make (the log part of N*log(N)). It's quite possible that this isn't worth the trade off in complexity and space (or your time :-). Keeping track of comparisons that you've already made could get ugly, and your temp space requirement would go from N/2 to possibly 3N/4... But since you're diving so deeply into this problem, I figured I'd throw it out there. OTOH, this could be close to the speedup that heavily optimized FFT algs get when they go from radix-2 to radix-4. Just thinking out loud... __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Tue Jul 23 07:36:11 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 22 Jul 2002 23:36:11 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem Message-ID: <20020723063611.26677.qmail@web40102.mail.yahoo.com> --0-1908127438-1027406171=:26257 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The latest version of this PEP will be in CVS, but the most recent copy as of this message is attached. I'm posting this to python-dev first to shave off the rough edges. I'll post to comp.lang.python after that. Please don't hesitate to email me directly if you have any questions on it. Cheers, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com --0-1908127438-1027406171=:26257 Content-Type: text/plain; name="pep-0296.txt" Content-Description: pep-0296.txt Content-Disposition: inline; filename="pep-0296.txt" PEP: 296 Title: The Buffer Problem Version: $Revision: 1.1 $ Last-Modified: $Date: 2002/07/22 21:03:34 $ Author: xscottg at yahoo.com (Scott Gilbert) Status: Draft Type: Standards Track Created: 12-Jul-2002 Python-Version: 2.3 Post-History: Abstract This PEP proposes the creation of a new standard type and builtin constructor called 'bytes'. The bytes object is an efficiently stored array of bytes with some additional characteristics that set it apart from several implementations that are similar. Rationale Python currently has many objects that implement something akin to the bytes object of this proposal. For instance the standard string, buffer, array, and mmap objects are all very similar in some regards to the bytes object. Additionally, several significant third party extensions have created similar objects to try and fill similar needs. Frustratingly, each of these objects is too narrow in scope and is missing critical features to make it applicable to a wider category of problems. Specification The bytes object has the following important characteristics: 1. Efficient underlying array storage via the standard C type "unsigned char". This allows fine grain control over how much memory is allocated. With the alignment restrictions designated in the next item, it is trivial for low level extensions to cast the pointer to a different type as needed. Also, since the object is implemented as an array of bytes, it is possible to pass the bytes object to the extensive library of routines already in the standard library that presently work with strings. For instance, the bytes object in conjunction with the struct module could be used to provide a complete replacement for the array module using only Python script. If an unusual platform comes to light, one where there isn't a native unsigned 8 bit type, the object will do its best to represent itself at the Python script level as though it were an array of 8 bit unsigned values. It is doubtful whether many extensions would handle this correctly, but Python script could be portable in these cases. 2. Alignment of the allocated byte array is whatever is promised by the platform implementation of malloc. A bytes object created from an extension can be supplied that provides any arbitrary alignment as the extension author sees fit. This alignment restriction should allow the bytes object to be used as storage for all standard C types - including PyComplex objects or other structs of standard C type types. Further alignment restrictions can be provided by extensions as necessary. 3. The bytes object implements a subset of the sequence operations provided by string/array objects, but with slightly different semantics in some cases. In particular, a slice always returns a new bytes object, but the underlying memory is shared between the two objects. This type of slice behavior has been called creating a "view". Additionally, repetition and concatenation are undefined for bytes objects and will raise an exception. As these objects are likely to find use in high performance applications, one motivation for the decision to use view slicing is that copying between bytes objects should be very efficient and not require the creation of temporary objects. The following code illustrates this: # create two 10 Meg bytes objects b1 = bytes(10000000) b2 = bytes(10000000) # copy from part of one to another with out creating a 1 Meg temporary b1[2000000:3000000] = b2[4000000:5000000] Slice assignment where the rvalue is not the same length as the lvalue will raise an exception. However, slice assignment will work correctly with overlapping slices (typically implemented with memmove). 4. The bytes object will be recognized as a native type by the pickle and cPickle modules for efficient serialization. (In truth, this is the only requirement that can't be implemented via a third party extension.) Partial solutions to address the need to serialize the data stored in a bytes-like object without creating a temporary copy of the data into a string have been implemented in the past. The tofile and fromfile methods of the array object are good examples of this. The bytes object will support these methods too. However, pickling is useful in other situations - such as in the shelve module, or implementing RPC of Python objects, and requiring the end user to use two different serialization mechanisms to get an efficient transfer of data is undesirable. XXX: Will try to implement pickling of the new bytes object in such a way that previous versions of Python will unpickle it as a string object. When unpickling, the bytes object will be created from memory allocated from Python (via malloc). As such, it will lose any additional properties that an extension supplied pointer might have provided (special alignment, or special types of memory). XXX: Will try to make it so that C subclasses of bytes type can supply the memory that will be unpickled into. For instance, a derived class called PageAlignedBytes would unpickle to memory that is also page aligned. On any platform where an int is 32 bits (most of them), it is currently impossible to create a string with a length larger than can be represented in 31 bits. As such, pickling to a string will raise an exception when the operation is not possible. At least on platforms supporting large files (many of them), pickling large bytes objects to files should be possible via repeated calls to the file.write() method. 5. The bytes type supports the PyBufferProcs interface, but a bytes object provides the additional guarantee that the pointer will not be deallocated or reallocated as long as a reference to the bytes object is held. This implies that a bytes object is not resizable once it is created, but allows the global interpreter lock (GIL) to be released while a separate thread manipulates the memory pointed to if the PyBytes_Check(...) test passes. This characteristic of the bytes object allows it to be used in situations such as asynchronous file I/O or on multiprocessor machines where the pointer obtained by PyBufferProcs will be used independently of the global interpreter lock. Knowing that the pointer can not be reallocated or freed after the GIL is released gives extension authors the capability to get true concurrency and make use of additional processors for long running computations on the pointer. 6. In C/C++ extensions, the bytes object can be created from a supplied pointer and destructor function to free the memory when the reference count goes to zero. The special implementation of slicing for the bytes object allows multiple bytes objects to refer to the same pointer/destructor. As such, a refcount will be kept on the actual pointer/destructor. This refcount is separate from the refcount typically associated with Python objects. XXX: It may be desirable to expose the inner refcounted object as an actual Python object. If a good use case arises, it should be possible for this to be implemented later with no loss to backwards compatibility. 7. It is also possible to signify the bytes object as readonly, in this case it isn't actually mutable, but does provide the other features of a bytes object. 8. The bytes object keeps track of the length of its data with a Python LONG_LONG type. Even though the current definition for PyBufferProcs restricts the length to be the size of an int, this PEP does not propose to make any changes there. Instead, extensions can work around this limit by making an explicit PyBytes_Check(...) call, and if that succeeds they can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer call to get the pointer and full length of the object as a LONG_LONG. The bytes object will raise an exception if the standard PyBufferProcs mechanism is used and the size of the bytes object is greater than can be represented by an integer. From Python scripting, the bytes object will be subscriptable with longs so the 32 bit int limit can be avoided. There is still a problem with the len() function as it is PyObject_Size() and this returns an int as well. As a workaround, the bytes object will provide a .length() method that will return a long. 9. The bytes object can be constructed at the Python scripting level by passing an int/long to the bytes constructor with the number of bytes to allocate. For example: b = bytes(100000) # alloc 100K bytes The constructor can also take another bytes object. This will be useful for the implementation of unpickling, and in converting a read-write bytes object into a read-only one. An optional second argument will be used to designate creation of a readonly bytes object. 10. From the C API, the bytes object can be allocated using any of the following signatures: PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly void (*dest)(void *ptr, void *user), void* user); In the PyBytes_FromPointer(...) function, if the dest function pointer is passed in as NULL, it will not be called. This should only be used for creating bytes objects from statically allocated space. The user pointer has been called a closure in other places. It is a pointer that the user can use for whatever purposes. It will be passed to the destructor function on cleanup and can be useful for a number of things. If the user pointer is not needed, NULL should be passed instead. 11. The bytes type will be a new style class as that seems to be where all standard Python types are headed. Contrast to existing types The most common way to work around the lack of a bytes object has been to simply use a string object in its place. Binary files, the struct/array modules, and several other examples exist of this. Putting aside the style issue that these uses typically have nothing to do with text strings, there is the real problem that strings are not mutable, so direct manipulation of the data returned in these cases is not possible. Also, numerous optimizations in the string module (such as caching the hash value or interning the pointers) mean that extension authors are on very thin ice if they try to break the rules with the string object. The buffer object seems like it was intended to address the purpose that the bytes object is trying fulfill, but several shortcomings in its implementation [1] have made it less useful in many common cases. The buffer object made a different choice for its slicing behavior (it returns new strings instead of buffers for slicing and other operations), and it doesn't make many of the promises on alignment or being able to release the GIL that the bytes object does. Also in regards to the buffer object, it is not possible to simply replace the buffer object with the bytes object and maintain backwards compatibility. The buffer object provides a mechanism to take the PyBufferProcs supplied pointer of another object and present it as its own. Since the behavior of the other object can not be guaranteed to follow the same set of strict rules that a bytes object does, it can't be used in places that a bytes object could. The array module supports the creation of an array of bytes, but it does not provide a C API for supplying pointers and destructors to extension supplied memory. This makes it unusable for constructing objects out of shared memory, or memory that has special alignment or locking for things like DMA transfers. Also, the array object does not currently pickle. Finally since the array object allows its contents to grow, via the extend method, the pointer can be changed if the GIL is not held while using it. Creating a buffer object from an array object has the same problem of leaving an invalid pointer when the array object is resized. The mmap object caters to its particular niche, but does not attempt to solve a wider class of problems. Finally, any third party extension can not implement pickling without creating a temporary object of a standard python type. For example in the Numeric community, it is unpleasant that a large array can't pickle without creating a large binary string to duplicate the array data. Backward Compatibility The only possibility for backwards compatibility problems that the author is aware of are in previous versions of Python that try to unpickle data containing the new bytes type. Reference Implementation XXX: Actual implementation is in progress, but changes are still possible as this PEP gets further review. The following new files will be added to the Python baseline: Include/bytesobject.h # C interface Objects/bytesobject.c # C implementation Lib/test/test_bytes.py # unit testing Doc/lib/libbytes.tex # documentation The following files will also be modified: Include/Python.h # adding bytesmodule.h include file Python/bltinmodule.c # adding the bytes type object Modules/cPickle.c # adding bytes to the standard types Lib/pickle.py # adding bytes to the standard types It is possible that several other modules could be cleaned up and implemented in terms of the bytes object. The mmap module comes to mind first, but as noted above it would be possible to reimplement the array module as a pure Python module. While it is attractive that this PEP could actually reduce the amount of source code by some amount, the author feels that this could cause unnecessary risk for breaking existing applications and should be avoided at this time. Additional Notes/Comments - Guido van Rossum wondered whether it would make sense to be able to create a bytes object from a mmap object. The mmap object appears to support the requirements necessary to provide memory for a bytes object. (It doesn't resize, and the pointer is valid for the lifetime of the object.) As such, a method could be added to the mmap module such that a bytes object could be created directly from a mmap object. An initial stab at how this would be implemented would be to use the PyBytes_FromPointer() function described above and pass the mmap_object as the user pointer. The destructor function would decref the mmap_object for cleanup. - Todd Miller notes that it may be useful to have two new functions: PyObject_AsLargeReadBuffer() and PyObject_AsLargeWriteBuffer that are similar to PyObject_AsReadBuffer() and PyObject_AsWriteBuffer(), but support getting a LONG_LONG length in addition to the void* pointer. These functions would allow extension authors to work transparently with bytes object (that support LONG_LONG lengths) and most other buffer like objects (which only support int lengths). These functions could be in lieu of, or in addition to, creating a specific PyByte_GetReadBuffer() and PyBytes_GetWriteBuffer() functions. XXX: The author thinks this is very a good idea as it paves the way for other objects to eventually support large (64 bit) pointers, and it should only affect abstract.c and abstract.h. Should this be added above? - It was generally agreed that abusing the segment count of the PyBufferProcs interface is not a good hack to work around the 31 bit limitation of the length. If you don't know what this means, then you're in good company. Most code in the Python baseline, and presumably in many third party extensions, punt when the segment count is not 1. References [1] The buffer interface http://mail.python.org/pipermail/python-dev/2000-October/009974.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: --0-1908127438-1027406171=:26257-- From tim.one@comcast.net Tue Jul 23 09:30:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 23 Jul 2002 04:30:11 -0400 Subject: [Python-Dev] Sorting In-Reply-To: <20020723062212.25747.qmail@web40102.mail.yahoo.com> Message-ID: [Scott Gilbert] > I'm curious if there is any literature that you've come across, or if > you've done any experiments with merging more than two parts at a > time. There's a literal mountain of research on the topic. I recommend "A Meticulous Analysis of Mergesort Programs" Jyrki Katajainen, Jesper Larsson Traff for a careful accounting of all operations that go into one of these beasts. They got the best results (and much better than quicksort) out of a 4-way bottom-up mergesort via very tedious code (e.g., it effectively represents which input run currently has the smallest next key via the program counter, by way of massive code duplication and oodles of gotos); they were afraid to write actual code for an 8-way version . OTOH, they were sorting random integers, and, e.g., were delighted to increase the # of comparisons when that could save a few other "dirt cheap" operations. > ... > My thinking is that many machines (probably yours for instance) have a > cache that is 4-way associative, so merging only 2 blocks at a time might > not be using the cache as well as it could. Also, changing from merging 2 > blocks to 3 or 4 blocks at a time would change the number of passes you > have to make (the log part of N*log(N)). > > It's quite possible that this isn't worth the trade off in complexity and > space (or your time :-). The real reason it's uninteresting to me is that it has no clear applicability to the cases this sort aims at: exploiting significant pre-existing order of various kinds. That leads to unbalanced run lengths when we're lucky, and if I'm merging a 2-element run with a 100,000-element run, high cache associativity isn't of much use. From the timings I showed before, it's clear that "good cases" of pre-existing order take time that depends almost entirely on just the number of comparisons needed; e.g., 3sort and +sort were as fast as /sort, where the latter does nothing but N-1 comparisons in a single left-to-right scan of the array. Comparisons are expensive enough in Python that doing O(log N) additional comparisons in 3sort and +sort, then moving massive amounts of the array around to fit the oddballs in place, costs almosts nothing more in percentage terms. Since these cases are already effectively as fast as a single left-to-right scan, there's simply no potential remaining for significant gain (unless you can speed a single left-to-right scan! that would be way cool). If you think you can write a sort for random Python arrays faster than the samplesort hybrid, be my guest: I'd love to see it! You should be aware that I've been making this challenge for years . Something to note: I think you have an overly simple view of Python's lists in mind. When we're merging two runs in the timing test, it's not *just* the list memory that's getting scanned. The lists contain pointers *to* float objects. The float objects have to get read up from memory too, and there goes the rest of your 4-way associativity. Indeed, if you read the comments in Lib/test/sortperf.py, you'll find that it performs horrid trickery to ensure that =sort and =sort work on physically distinct float objects; way back when, these particular tests ran much faster, and that turned out to be partly because, e.g., [0.5] * N constructs a list with N pointers to a single float object, and that was much easier on the memory system. We got a really nice slowdown by forcing N distinct copies of 0.5. In earlier Pythons the comparison also got short-circuited by an early pointer-equality test ("if they're the same object, they must be equal"), but that's not done anymore. A quick run just now showed that =sort still runs significantly quicker if given a list of identical objects; the only explanation left for that appears to be cache effects. > Keeping track of comparisons that you've already made could get ugly, Most researches have found that a fancy data structure for this is counter-productive: so long as the m in m-way merging isn't ridiculously large, keeping the head entries in a straight vector with m elements runs fastest. But they're not worried about Python's expensive-comparison case. External sorts using m-way merging with large m typically use a selection tree much like a heap to reduce the expense of keeping track (see, e.g., Knuth for details). > and your temp space requirement would go from N/2 to possibly 3N/4... > But since you're diving so deeply into this problem, I figured I'd > throw it out there. > > OTOH, this could be close to the speedup that heavily optimized FFT algs > get when they go from radix-2 to radix-4. Just thinking out loud... I don't think that's comparable. Moving to radix 4 cuts the total number of non-trivial complex multiplies an FFT has to do, and non-trivial complex multiplies are the expensive part of what an FFT does. In contrast, boosting the m in m-way merging doesn't cut the number of comparisons needed at all (to the contrary, if you're not very careful it increases them), and comparisons are what kill sorting routines in Python. The elaborate gimmicks in timsort for doing merges of unbalanced runs do cut the total number of comparisons needed, and that's where the huge wins come from. From ark@research.att.com Tue Jul 23 14:58:30 2002 From: ark@research.att.com (Andrew Koenig) Date: 23 Jul 2002 09:58:30 -0400 Subject: [Python-Dev] The iterator story In-Reply-To: References: Message-ID: Ping> - Iterators provide just one method, __next__(). Ping> - The built-in next() calls tp_iternext. For instances, Ping> tp_iternext calls __next__. Ping> - Objects wanting to be iterated over provide just one method, Ping> __iter__(). Some of these are containers, but not all. Ping> - The built-in iter(foo) calls tp_iter. For instances, Ping> tp_iter calls __iter__. Ping> - "for x in y" gets iter(y) and uses it as an iterator. Ping> - "for x from y" just uses y as the iterator. +1. Ping> - We have a nice clean division between containers and iterators. Ping> - When you see "for x in y" you know that y is a container. What if y is a file? You already said that files are not containers. Ping> - When you see "for x from y" you know that y is an iterator. Ping> - "for x in y" never destroys y. Ping> - "if x in y" never destroys y. What if y is a file? Ping> Other notes: Ping> - The file problem has a consistent solution. Instead of writing Ping> "for line in file" you write Ping> for line from file: Ping> print line Ping> Being forced to write "from" signals to you that the file is Ping> eaten up. There is no expectation that "for line from file" Ping> will work again. Ah. So you want to break "for line in file:", which works now? I'm still +1 as long as there is a transition scheme. Ping> My Not-So-Ideal Protocol Ping> ------------------------ Ping> All right. So new syntax may be hard to swallow. An alternative Ping> is to introduce an adapter that turns an iterator into something Ping> that "for" will accept -- that is, the opposite of iter(). Ping> - The built-in seq(it) returns x such that iter(x) yields it. Ping> Then instead of writing Ping> for x from it: Ping> you would write Ping> for x in seq(it): Ping> and the rest would be the same. The use of "seq" here is what Ping> would flag the fact that "it" will be destroyed. I prefer "for x from it: -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From thomas.heller@ion-tof.com Tue Jul 23 15:18:31 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 23 Jul 2002 16:18:31 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <003c01c23253$d2f80860$e000a8c0@thomasnotebook> > PEP: 296 > Title: The Buffer Problem IMO should better be 'The bytes Object' > 6. In C/C++ extensions, the bytes object can be created from a supplied > pointer and destructor function to free the memory when the > reference count goes to zero. > > The special implementation of slicing for the bytes object allows > multiple bytes objects to refer to the same pointer/destructor. > As such, a refcount will be kept on the actual > pointer/destructor. This refcount is separate from the refcount > typically associated with Python objects. > Why is this? Wouldn't it be sufficient if views keep references to the 'viewed' byte object? > 8. The bytes object keeps track of the length of its data with a Python > LONG_LONG type. Even though the current definition for PyBufferProcs > restricts the length to be the size of an int, this PEP does not propose > to make any changes there. Instead, extensions can work around this limit > by making an explicit PyBytes_Check(...) call, and if that succeeds they > can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer call to > get the pointer and full length of the object as a LONG_LONG. > > The bytes object will raise an exception if the standard PyBufferProcs > mechanism is used and the size of the bytes object is greater than can be > represented by an integer. > > From Python scripting, the bytes object will be subscriptable with longs > so the 32 bit int limit can be avoided. > > There is still a problem with the len() function as it is PyObject_Size() > and this returns an int as well. As a workaround, the bytes object will > provide a .length() method that will return a long. > Is this worth the trouble? (Hm, 64-bit platforms with 32-bit integers remind my of the broken DOS/Windows 3.1 platforms with near/far/huge pointers). > 9. The bytes object can be constructed at the Python scripting level by > passing an int/long to the bytes constructor with the number of bytes to > allocate. For example: > > b = bytes(100000) # alloc 100K bytes > > The constructor can also take another bytes object. This will be useful > for the implementation of unpickling, and in converting a read-write bytes > object into a read-only one. An optional second argument will be used to > designate creation of a readonly bytes object. > > 10. From the C API, the bytes object can be allocated using any of the > following signatures: > > PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); > PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly > void (*dest)(void *ptr, void *user), void* user); > > In the PyBytes_FromPointer(...) function, if the dest function pointer is > passed in as NULL, it will not be called. This should only be used for > creating bytes objects from statically allocated space. > > The user pointer has been called a closure in other places. It is a > pointer that the user can use for whatever purposes. It will be passed to > the destructor function on cleanup and can be useful for a number of > things. If the user pointer is not needed, NULL should be passed instead. Shouldn't there be constructors to create a view of a bytes/view object, or are we supposed to create them by slicing? > 11. The bytes type will be a new style class as that seems to be where all > standard Python types are headed. Good. Thanks, Thomas From xscottg@yahoo.com Tue Jul 23 16:40:55 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 23 Jul 2002 08:40:55 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <003c01c23253$d2f80860$e000a8c0@thomasnotebook> Message-ID: <20020723154055.54251.qmail@web40106.mail.yahoo.com> --- Thomas Heller wrote: > > PEP: 296 > > Title: The Buffer Problem > > IMO should better be 'The bytes Object' > Part of the title was just me being cute, but apparently this problem has a long history and has been referred to as "The Buffer Problem" many times in the past. Plus when I first submitted it, I wasn't sure the name "bytes" was going to stick. > > Why is this? Wouldn't it be sufficient if views keep references > to the 'viewed' byte object? > They do, but the referenced "inner-thing" needs it's own reference count to know how many "bytes-views" are sharing it. When a bytes-view gets cleaned up, it decrefs the reference count of the inner-thing it is referring to, and if the reference count goes to zero, the bytes-view calls the destructor for the inner-thing. > > > and this returns an int as well. As a workaround, the bytes object > > will provide a .length() method that will return a long. > > > Is this worth the trouble? > (Hm, 64-bit platforms with 32-bit integers remind my of the broken > DOS/Windows 3.1 platforms with near/far/huge pointers). > I think most 64 bit platforms actually have a 32 bit int. Some of them (like the Alpha) have a 64 bit long, but Python has made extensive use of the int type in the PyBufferProcs interface and elsewhere. So if we want to make full use of large memory machines (I do), something has to be done. The only way to reliably get a 64 bit integer on these platforms is to use the "long long" type or __int64 on Windows (spelled LONG_LONG in Python). Note that the .length() method will return a Python long, not a C long. > > Shouldn't there be constructors to create a view of a bytes/view object, > or are we supposed to create them by slicing? > Item 9 in the PEP talks about this. Maybe I'll add some text to make this more clear. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From thomas.heller@ion-tof.com Tue Jul 23 19:04:00 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 23 Jul 2002 20:04:00 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723154055.54251.qmail@web40106.mail.yahoo.com> Message-ID: <027f01c23273$528d1240$e000a8c0@thomasnotebook> > > > > Why is this? Wouldn't it be sufficient if views keep references > > to the 'viewed' byte object? > > > > They do, but the referenced "inner-thing" needs it's own reference count to > know how many "bytes-views" are sharing it. When a bytes-view gets cleaned > up, it decrefs the reference count of the inner-thing it is referring to, > and if the reference count goes to zero, the bytes-view calls the > destructor for the inner-thing. > Hm, I thought the 'inner-thing' is a python object (with it's own refcount) itself. Isn't the 'inner-thing' the bytes object owning the allocated memory? And the 'outer-things' (the views) simply viewing slices of this memory? Thomas From xscottg@yahoo.com Tue Jul 23 19:33:02 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 23 Jul 2002 11:33:02 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <027f01c23273$528d1240$e000a8c0@thomasnotebook> Message-ID: <20020723183302.98153.qmail@web40102.mail.yahoo.com> --- Thomas Heller wrote: > > > > They do, but the referenced "inner-thing" needs it's own reference > count to > > know how many "bytes-views" are sharing it. When a bytes-view gets > cleaned > > up, it decrefs the reference count of the inner-thing it is referring > to, > > and if the reference count goes to zero, the bytes-view calls the > > destructor for the inner-thing. > > > Hm, I thought the 'inner-thing' is a python object (with it's own > refcount) itself. Isn't the 'inner-thing' the bytes object owning > the allocated memory? And the 'outer-things' (the views) simply > viewing slices of this memory? > The outer-thing is definitely the "bytes object", since that's what people will work with directly. It has to be a true Python object in all its glory. The inner-thing _could_ be a Python object (and Guido suggested that maybe it should be), but that's an implementation detail. I don't know why anyone would want to work with the inner-thing directly. However, one good use case and I'll be sold on the idea. I'll definitely add some verbage to clarify this in the next revision. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From jmiller@stsci.edu Tue Jul 23 19:47:02 2002 From: jmiller@stsci.edu (Todd Miller) Date: Tue, 23 Jul 2002 14:47:02 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723183302.98153.qmail@web40102.mail.yahoo.com> Message-ID: <3D3DA4A6.6040802@stsci.edu> Scott Gilbert wrote: >--- Thomas Heller wrote: > >>>They do, but the referenced "inner-thing" needs it's own reference >>> >>count to >> >>>know how many "bytes-views" are sharing it. When a bytes-view gets >>> >>cleaned >> >>>up, it decrefs the reference count of the inner-thing it is referring >>> >>to, >> >>>and if the reference count goes to zero, the bytes-view calls the >>>destructor for the inner-thing. >>> >>Hm, I thought the 'inner-thing' is a python object (with it's own >>refcount) itself. Isn't the 'inner-thing' the bytes object owning >>the allocated memory? And the 'outer-things' (the views) simply >>viewing slices of this memory? >> > >The outer-thing is definitely the "bytes object", since that's what people >will work with directly. It has to be a true Python object in all its >glory. > >The inner-thing _could_ be a Python object (and Guido suggested that maybe >it should be), but that's an implementation detail. I don't know why > > >anyone would want to work with the inner-thing directly. However, one good >use case and I'll be sold on the idea. > Letting the inner-thing be a mmap would enable slices of a mmap as views as opposed to strings. We'd certainly like this for numarray, especially if it meant pickling efficiency for mmap based arrays. > > >I'll definitely add some verbage to clarify this in the next revision. > >Cheers, > -Scott > > > >__________________________________________________ >Do You Yahoo!? >Yahoo! Health - Feel better, live better >http://health.yahoo.com > >_______________________________________________ >Python-Dev mailing list >Python-Dev@python.org >http://mail.python.org/mailman/listinfo/python-dev > -- Todd Miller jmiller@stsci.edu STSCI / SSG (410) 338 4576 From thomas.heller@ion-tof.com Tue Jul 23 20:59:18 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 23 Jul 2002 21:59:18 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723183302.98153.qmail@web40102.mail.yahoo.com> Message-ID: <030901c23283$6e2be430$e000a8c0@thomasnotebook> > > > > > > They do, but the referenced "inner-thing" needs it's own reference > > count to > > > know how many "bytes-views" are sharing it. When a bytes-view gets > > cleaned > > > up, it decrefs the reference count of the inner-thing it is referring > > to, > > > and if the reference count goes to zero, the bytes-view calls the > > > destructor for the inner-thing. > > > > > Hm, I thought the 'inner-thing' is a python object (with it's own > > refcount) itself. Isn't the 'inner-thing' the bytes object owning > > the allocated memory? And the 'outer-things' (the views) simply > > viewing slices of this memory? > > > > The outer-thing is definitely the "bytes object", since that's what people > will work with directly. It has to be a true Python object in all its > glory. > > The inner-thing _could_ be a Python object (and Guido suggested that maybe > it should be), but that's an implementation detail. I don't know why > anyone would want to work with the inner-thing directly. However, one good > use case and I'll be sold on the idea. > > I'll definitely add some verbage to clarify this in the next revision. > I've quickly read the pep again. I see no mentioning of an 'inner object' and an 'outer object' there, so I would recommend you try to explain this (if you want to stay with this decision). OTOH, your 'inner thing' has a refcount, an (optional) destructor which is a kind of closure, instance variables (memory pointer, readonly flag), so there is not too much missing for a full python object. Could the 'inner thing' have the same type as the 'outer thing': the inner thing being a full view of itself, and the outer thing probably a view viewing only a slice of the inner thing? Thomas From greg@cosc.canterbury.ac.nz Tue Jul 23 23:29:22 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jul 2002 10:29:22 +1200 (NZST) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020723183302.98153.qmail@web40102.mail.yahoo.com> Message-ID: <200207232229.g6NMTM609792@oma.cosc.canterbury.ac.nz> Scott Gilbert : > The inner-thing _could_ be a Python object (and Guido suggested that > maybe it should be), but that's an implementation detail. In that case, unless there's some reason for it *not* to be a Python object, you might as well make it one and take advantage of all the Python refcount machinery. If you use Pyrex for the implementation, making Python objects will be dead easy! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jason-exp-1028157503.aebc46@mastaler.com Wed Jul 24 00:18:58 2002 From: jason-exp-1028157503.aebc46@mastaler.com (jason-exp-1028157503.aebc46@mastaler.com) Date: Tue, 23 Jul 2002 17:18:58 -0600 Subject: [Python-Dev] Re: Where's time.daylight??? References: <15672.18628.831787.897474@anthem.wooz.org> <200207191732.g6JHWJD28040@pcp02138704pcs.reston01.va.comcast.net> <200207191910.g6JJAUJ32606@pcp02138704pcs.reston01.va.comcast.net> <200207200043.g6K0hMJ27043@pcp02138704pcs.reston01.va.comcast.net> Message-ID: martin@v.loewis.de (Martin v. Loewis) writes: > I have no OSF/1 (aka whatever) system http://www.testdrive.compaq.com/ -- (http://tmda.net/) From xscottg@yahoo.com Wed Jul 24 09:13:26 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 01:13:26 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <030901c23283$6e2be430$e000a8c0@thomasnotebook> Message-ID: <20020724081326.995.qmail@web40107.mail.yahoo.com> --- Thomas Heller wrote: > > I've quickly read the pep again. > I see no mentioning of an 'inner object' and an 'outer object' > there, so I would recommend you try to explain this (if you want to stay > with this decision). > This is just the terminology I was using to try and communicate with you. The outer thing is the bytes object (which is generally interesting to users), and the inner thing is an implementation detail. Like I said, I'll add more text on this in the next revision since it seems to be causing confusion. > > OTOH, your 'inner thing' has a refcount, an (optional) destructor > which is a kind of closure, instance variables (memory pointer, > readonly flag), so there is not too much missing for a full > python object. > I still haven't heard a good reason to expose the inner thing to user code yet though. So even if the inner thing is a PyObject, who would know? It's probably better for maintenance to use something everyone is already familiar with, so I'll probably do it for that reason. > > Could the 'inner thing' have the same type as the 'outer thing': > the inner thing being a full view of itself, and the outer thing > probably a view viewing only a slice of the inner thing? > It might be. However, I'm afraid this will lead to some ugly special cases when the view is the inner thing versus when the view is referring to some other thing. It's probably cleaner to make a clear distinction between the two and stick with it throughout. (I'm growing to dislike this "thing" terminology....) __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 24 09:22:29 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 01:22:29 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <3D3DA4A6.6040802@stsci.edu> Message-ID: <20020724082229.94975.qmail@web40105.mail.yahoo.com> --- Todd Miller wrote: > > Letting the inner-thing be a mmap would enable slices of a mmap as views > as opposed to strings. We'd certainly like this for numarray, > especially if it meant pickling efficiency for mmap based arrays. > The first version of the PEP I sent to you directly didn't have this, but the latest version I posted to python-dev mentions it briefly. It seems both you and Guido came up with the same idea regarding mmap. The current strategy is to add a method to the mmap module that would return a bytes object from an mmap object. I would like it to be able to pickle too. (Which probably means the new method in the mmap module will probably return a class derived from bytes, and not the bytes base class.) However, this is sort of orthogonal to the PEP. If the bytes object makes it in, but the mmap enhancements get left out, a third party extension could implement the mmap_to_bytes function and still make use of the efficient pickling by deriving from the bytes object. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 24 10:05:12 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 02:05:12 -0700 (PDT) Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: <20020724090512.2485.qmail@web40110.mail.yahoo.com> --- Tim Peters wrote: > > "A Meticulous Analysis of Mergesort Programs" > Jyrki Katajainen, Jesper Larsson Traff > Thanks for the cool reference. I read a bit of it last night. I ought to know by now that there really isn't much new under the sun... > > The real reason it's uninteresting to me is that it has no clear > applicability to the cases this sort aims at: exploiting significant > pre-existing order of various kinds. > [...] > (unless you can speed a single left-to-right scan! that would be way > cool). > Do a few well placed prefetch instructions buy you anything? The MMU could be grabbing your next pointer while you're doing your current comparison. And of course you could implement it as a macro that evaporates for whatever platforms you didn't care to implement it on. (I need to look it up, but I'm pretty sure you could do this for both VC++ and gcc on recent x86s.) > > If you think you can write a sort for random Python arrays faster than > the > samplesort hybrid, be my guest: I'd love to see it! You should be aware > that I've been making this challenge for years . > You're remarkably good at taunting me. :-) I've spent a little time on a few of these optimization challenges that get posted. One of these days I'll best you... (not this day though) > > Something to note: I think you have an overly simple view of Python's > lists in mind. > No, I think I understand the model. I just assumed the objects pointed to would be scattered pretty randomly through memory. So statistically they'll step on the same cache lines as your list once in a while, but that it would average out to being less interesting than the adjacent slots in the list. I'm frequently wrong about stuff like this though... Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From jmiller@stsci.edu Wed Jul 24 12:21:41 2002 From: jmiller@stsci.edu (Todd Miller) Date: Wed, 24 Jul 2002 07:21:41 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020724082229.94975.qmail@web40105.mail.yahoo.com> Message-ID: <3D3E8DC5.7040906@stsci.edu> Scott Gilbert wrote: >--- Todd Miller wrote: > >>Letting the inner-thing be a mmap would enable slices of a mmap as views >>as opposed to strings. We'd certainly like this for numarray, >>especially if it meant pickling efficiency for mmap based arrays. >> > >The first version of the PEP I sent to you directly didn't have this, but >the latest version I posted to python-dev mentions it briefly. It seems >both you and Guido came up with the same idea regarding mmap. > Yeah, I saw that in your respose. Sorry. FWIW, anything I say here should be regarded as a reflection of STSCI's current technical goals as channeled by me, and not necessarily "my ideas". Exploiting mmapping has been a pretty long standing goal here at STSCI. > > >The current strategy is to add a method to the mmap module that would >return a bytes object from an mmap object. I would like it to be able to >pickle too. (Which probably means the new method in the mmap module will >probably return a class derived from bytes, and not the bytes base class.) > This runs pretty wide of my current mental ruts, but it sounds like conservative design, so great. > >However, this is sort of orthogonal to the PEP. If the bytes object makes >it in, but the mmap enhancements get left out, a third party extension >could implement the mmap_to_bytes function and still make use of the >efficient pickling by deriving from the bytes object. > I understand. That sounds excellent. > > >Cheers, > -Scott > > >__________________________________________________ >Do You Yahoo!? >Yahoo! Health - Feel better, live better >http://health.yahoo.com > Back to numarray, Todd From thomas.heller@ion-tof.com Wed Jul 24 12:38:00 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 24 Jul 2002 13:38:00 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <048301c23306$90992090$e000a8c0@thomasnotebook> Let me ask some questions and about platforms with 32-bit integers and 64-bit longs: > 2. Alignment of the allocated byte array is whatever is promised by the > platform implementation of malloc. On these platforms, does malloc() accept an unsigned long argument for the requested size? > [...] > 8. The bytes object keeps track of the length of its data with a Python > LONG_LONG type. > [...] > From Python scripting, the bytes object will be subscriptable with longs > so the 32 bit int limit can be avoided. How is indexing done in C? Can you index these byte arrays by longs? > 9. The bytes object can be constructed at the Python scripting level by > passing an int/long to the bytes constructor with the number of bytes to > allocate. For example: > > b = bytes(100000) # alloc 100K bytes > > The constructor can also take another bytes object. This will be useful > for the implementation of unpickling, and in converting a read-write bytes > object into a read-only one. An optional second argument will be used to > designate creation of a readonly bytes object. > > 10. From the C API, the bytes object can be allocated using any of the > following signatures: > > PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); > PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly > void (*dest)(void *ptr, void *user), void* user); Hm, if 'bytes' is a new style class, these functions should require a 'PyObject *type' parameter as well. OTOH, new style classes are usually created by calling their *type*, so you should describe the signature of the byte type's tp_call. (It may be possible to supply variations of the above functions for convenience as well.) > The array module supports the creation of an array of bytes, but it does > not provide a C API for supplying pointers and destructors to extension > supplied memory. This makes it unusable for constructing objects out of > shared memory, or memory that has special alignment or locking for things > like DMA transfers. Also, the array object does not currently pickle. > Finally since the array object allows its contents to grow, via the extend > method, the pointer can be changed if the GIL is not held while using it. ...or if any code is executed which may change the array object, even if the GIL is held! Thomas From xscottg@yahoo.com Wed Jul 24 17:01:58 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 09:01:58 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <048301c23306$90992090$e000a8c0@thomasnotebook> Message-ID: <20020724160158.34860.qmail@web40112.mail.yahoo.com> --- Thomas Heller wrote: > Let me ask some questions and about platforms with 32-bit > integers and 64-bit longs: > > > 2. Alignment of the allocated byte array is whatever is promised by > > the platform implementation of malloc. > > On these platforms, does malloc() accept an unsigned long argument > for the requested size? > At the moment, the only 64 bit platform that I have easy access to is Tru64/Alpha. That version of malloc takes a size_t which is a 64 bit quantity. I believe most semi-sane platforms will use a size_t as argument for malloc, and I believe most semi-sane platforms will have a size_t that is the same number of bits as a pointer for that platform. > > [...] > > 8. The bytes object keeps track of the length of its data with a > > Python LONG_LONG type. > > [...] > > From Python scripting, the bytes object will be subscriptable with > > longs so the 32 bit int limit can be avoided. > > How is indexing done in C?> Indexing is done by grabbing the pointer and length via a call like: int PyObject_AsLargeReadBuffer(PyObject* bo, unsigned char** ptr, LONG_LONG* len); Note that the name could be different depending on whether it ends up in abstract.h or bytesobject.h. > Can you index these byte arrays by longs? You could index it via a long, but using a LONG_LONG is safer. My understanding is that on Win64 a long will only be 32 bits even though void* is 64 bits. So for that platform, LONG_LONG will be a typedef for __int64 which is 64 bits. None of this matters for 32 bit platforms. All 32 bit platforms that I know of have sizeof(int) == sizeof(long) == sizeof(void*) == 4. So even if you wanted to subscript with a long or LONG_LONG, the pointer could only point to something about 2 Gigs (31 bits) in size. > > > > 10. From the C API, the bytes object can be allocated using any of > > the following signatures: > > > > PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); > > PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, > > int readonly void (*dest)(void *ptr, void *user), > > void* user); > > Hm, if 'bytes' is a new style class, these functions should > require a 'PyObject *type' parameter as well. OTOH, new style > classes are usually created by calling their *type*, so you > should describe the signature of the byte type's tp_call. > (It may be possible to supply variations of the above functions > for convenience as well.) > I consider these to be the minimum convenience functions that are necessary for the functionality I'd like to see. I'll follow the conventions for creating a new style class for PyBytesObject to the letter, and any other variations of the above convenience functions can be added as needed. (It's easier to add stuff than take it away...) Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 24 17:02:15 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 09:02:15 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <048301c23306$90992090$e000a8c0@thomasnotebook> Message-ID: <20020724160215.30526.qmail@web40104.mail.yahoo.com> --- Thomas Heller wrote: > Let me ask some questions and about platforms with 32-bit > integers and 64-bit longs: > > > 2. Alignment of the allocated byte array is whatever is promised by > > the platform implementation of malloc. > > On these platforms, does malloc() accept an unsigned long argument > for the requested size? > At the moment, the only 64 bit platform that I have easy access to is Tru64/Alpha. That version of malloc takes a size_t which is a 64 bit quantity. I believe most semi-sane platforms will use a size_t as argument for malloc, and I believe most semi-sane platforms will have a size_t that is the same number of bits as a pointer for that platform. > > [...] > > 8. The bytes object keeps track of the length of its data with a > > Python LONG_LONG type. > > [...] > > From Python scripting, the bytes object will be subscriptable with > > longs so the 32 bit int limit can be avoided. > > How is indexing done in C?> Indexing is done by grabbing the pointer and length via a call like: int PyObject_AsLargeReadBuffer(PyObject* bo, unsigned char** ptr, LONG_LONG* len); Note that the name could be different depending on whether it ends up in abstract.h or bytesobject.h. > Can you index these byte arrays by longs? You could index it via a long, but using a LONG_LONG is safer. My understanding is that on Win64 a long will only be 32 bits even though void* is 64 bits. So for that platform, LONG_LONG will be a typedef for __int64 which is 64 bits. None of this matters for 32 bit platforms. All 32 bit platforms that I know of have sizeof(int) == sizeof(long) == sizeof(void*) == 4. So even if you wanted to subscript with a long or LONG_LONG, the pointer could only point to something about 2 Gigs (31 bits) in size. > > > > 10. From the C API, the bytes object can be allocated using any of > > the following signatures: > > > > PyObject* PyBytes_FromLength(LONG_LONG len, int readonly); > > PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, > > int readonly void (*dest)(void *ptr, void *user), > > void* user); > > Hm, if 'bytes' is a new style class, these functions should > require a 'PyObject *type' parameter as well. OTOH, new style > classes are usually created by calling their *type*, so you > should describe the signature of the byte type's tp_call. > (It may be possible to supply variations of the above functions > for convenience as well.) > I consider these to be the minimum convenience functions that are necessary for the functionality I'd like to see. I'll follow the conventions for creating a new style class for PyBytesObject to the letter, and any other variations of the above convenience functions can be added as needed. (It's easier to add stuff than take it away...) Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From tim@zope.com Wed Jul 24 18:49:10 2002 From: tim@zope.com (Tim Peters) Date: Wed, 24 Jul 2002 13:49:10 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020724160158.34860.qmail@web40112.mail.yahoo.com> Message-ID: [Scott Gilbert] > At the moment, the only 64 bit platform that I have easy access to is > Tru64/Alpha. That version of malloc takes a size_t which is a 64 bit > quantity. > > I believe most semi-sane platforms will use a size_t as argument for > malloc, That much is required by the C standard, so you can rely on it. > and I believe most semi-sane platforms will have a size_t that is > the same number of bits as a pointer for that platform. The std is silent on this; it's true on 64-bit Linux and Win64, so "good enough". >> Can you index these byte arrays by longs? > You could index it via a long, but using a LONG_LONG is safer. My > understanding is that on Win64 a long will only be 32 bits even though > void* is 64 bits. Right. > So for that platform, LONG_LONG will be a typedef for __int64 which is 64 > bits. Also on Win32: LONG_LONG is a 64-bit integral type on Win32 and Win64. > None of this matters for 32 bit platforms. ? Win32 has always supported "large files" and "large mmaps" (where large means 64-bit capacity), and most 32-bit flavors of Unix do too. It's a x-platform mess, though. > All 32 bit platforms that I know of have sizeof(int) == sizeof(long) == > sizeof(void*) == 4. Same here. > So even if you wanted to subscript with a long or LONG_LONG, the pointer > could only point to something about 2 Gigs (31 bits) in size. That depends on how it's implemented; on a 32-bit box, supporting a LONG_LONG subscript may require some real pain, but isn't impossible. For example, Python manages to support 64-bit "subscripts" to f.seek() on the major 32-bit boxes right now. From tim.one@comcast.net Thu Jul 25 00:01:32 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 24 Jul 2002 19:01:32 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: FYI, I've been poking at this in the background. The ~sort regression is vastly reduced, via removing special-casing and adding more general adaptivity (if you read the timsort.txt file, the special case for run lengths within a factor of 2 of each other went away, replaced by a more intelligent mix of one-pair-at-a-time versus galloping modes). *sort lost about 1% as a result (one-pair-at-a-time is maximally effective for *sort, but in a random mix every now again the "switch to the less efficient (for it) galloping mode" heuristic triggers by blind luck). There's also a significant systematic regression in timsort's +sort case, although it remains faster (and much more general) than samplesort's special-casing of it; also a mix of small regressions and speedups in 3sort. These are because, to simplify experimenting, I threw out the "copy only the shorter run" gimmick, always copying the left run instead. That hurts +sort systematically, as instead of copying just the 10 oddball elements at the end, it copies the very long run of N-10 elements instead (and as many as N-1 temp pointers can be needed, up from N/2). That's all repairable, it's just a PITA to do it. C:\Code\python\PCbuild>python -O sortperf.py 15 20 1 samplesort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.18 0.01 0.02 0.11 0.01 0.04 0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08 0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18 0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37 0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75 0.17 2.47 20 1048576 5.48 0.37 0.35 5.18 0.45 1.52 0.35 5.35 timsort i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.17 0.01 0.02 0.01 0.01 0.05 0.01 0.02 16 65536 0.24 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.54 0.05 0.04 0.05 0.05 0.19 0.04 0.09 18 262144 1.17 0.09 0.09 0.10 0.10 0.38 0.09 0.18 19 524288 2.56 0.18 0.17 0.20 0.20 0.79 0.17 0.36 20 1048576 5.54 0.37 0.35 0.37 0.41 1.62 0.35 0.73 In short, there's no real "speed argument" against this anymore (as I said in the first msg of this thread, the ~sort regression was serious -- it's an important case; turns out galloping is very effective at speeding it too, provided that dumbass premature special-casing doesn't stop galloping from trying ). From xscottg@yahoo.com Thu Jul 25 00:22:54 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 24 Jul 2002 16:22:54 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: Message-ID: <20020724232254.86946.qmail@web40101.mail.yahoo.com> --- Tim Peters wrote: > > > So for that platform, LONG_LONG will be a typedef for __int64 which is > > 64 bits. > > Also on Win32: LONG_LONG is a 64-bit integral type on Win32 and Win64. > Yep. I was trying to contrast that on most platforms LONG_LONG is an alias for "long long", but on Windows (32 or 64) it's going to be an __int64. > > > So even if you wanted to subscript with a long or LONG_LONG, the > > pointer could only point to something about 2 Gigs (31 bits) in size. > > That depends on how it's implemented; on a 32-bit box, supporting a > LONG_LONG subscript may require some real pain, but isn't impossible. > For > example, Python manages to support 64-bit "subscripts" to f.seek() on the > major 32-bit boxes right now. > I should have been more clear. I was referring specifically to working with pointers: datum = *(pointer + offset); or: datum = pointer[offset]; Just so there is no confusion, you aren't suggesting that the bytes PEP should provide a mechanism to support chunks of memory larger than 4 Gigs on 32 bit platforms right? I think the bytes object could be a part of the solution to that problem, at least I know how I would do that under Win32, but I'd rather not kluge up the interface to the bytes object to support it directly. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From guido@python.org Thu Jul 25 01:04:51 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 24 Jul 2002 20:04:51 -0400 Subject: [Python-Dev] Powerpoint slide for keynotes available Message-ID: <200207250004.g6P04pP20522@pcp02138704pcs.reston01.va.comcast.net> I've put the powerpoint slides for my keynotes at EuroPython and OSCON on the web. If someone can donate PDF that would be great (the HTML generated by Powerpoint sucks too much to be worth it IMO). http://www.python.org/doc/essays/ppt/ (scroll to end) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Jul 25 05:37:00 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 25 Jul 2002 00:37:00 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020724232254.86946.qmail@web40101.mail.yahoo.com> Message-ID: [Scott Gilbert] > ... > I should have been more clear. I was referring specifically to working > with pointers: > > datum = *(pointer + offset); > or: > datum = pointer[offset]; Na, my fault -- I fit in the email between other things, and hadn't read the whole thread up to that point. It was clear enough in context. > Just so there is no confusion, you aren't suggesting that the bytes PEP > should provide a mechanism to support chunks of memory larger than 4 Gigs > on 32 bit platforms right? It depends on how insane you are. It sure as heck doesn't *sound* like this is the bytes object's problem to solve, but then if people want their data sorted they shouldn't let it get out of order to begin with either . > I think the bytes object could be a part of the solution to that problem, > at least I know how I would do that under Win32, but I'd rather not kluge > up the interface to the bytes object to support it directly. I agree. From thomas.heller@ion-tof.com Thu Jul 25 08:45:44 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 25 Jul 2002 09:45:44 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: Message-ID: <011201c233af$4883dd50$e000a8c0@thomasnotebook> > [Scott Gilbert] > > At the moment, the only 64 bit platform that I have easy access to is > > Tru64/Alpha. That version of malloc takes a size_t which is a 64 bit > > quantity. > > > > I believe most semi-sane platforms will use a size_t as argument for > > malloc, > [Tim] > That much is required by the C standard, so you can rely on it. > > > and I believe most semi-sane platforms will have a size_t that is > > the same number of bits as a pointer for that platform. > > The std is silent on this; it's true on 64-bit Linux and Win64, so "good > enough". > > >> Can you index these byte arrays by longs? > > > You could index it via a long, but using a LONG_LONG is safer. My > > understanding is that on Win64 a long will only be 32 bits even though > > void* is 64 bits. > > Right. So isn't the conclusion that sizeof(size_t) == sizeof(void *) on any platform, and so the index should be of type size_t instead of int, long, or LONG_LONG (aka __int64 in some places)? Thomas From thomas.heller@ion-tof.com Thu Jul 25 09:07:43 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 25 Jul 2002 10:07:43 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <014b01c233b2$5a9bf240$e000a8c0@thomasnotebook> What if we would 'fix' the buffer interface? Extend the PyBufferProcs structure by new fields: typedef size_t (*getlargereadbufferproc)(PyObject *, void **); typedef size_t (*getlargewritebufferproc)(PyObject *, void **); typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* new fields */ getlargereadbufferproc bf_getlargereadbufferproc; getlargewritebufferproc bf_getlargewritebufferproc; } PyBufferProcs; The new fields are present if the Py_TPFLAGS_HAVE_GETLARGEBUFFER flag is set in the object's type. Py_TPFLAGS_HAVE_GETLARGEBUFFER implies the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. These functions have the same semantics Scott describes: they must only be implemented by types only return addresses which are valid as long as the Python 'source' object is alive. Python strings, unicode strings, mmap objects, and maybe other types would expose the large buffer interface, but the array type would *not*. We could also change the name from 'large buffer interface' to something more sensible, currently I don't have a better name. Thomas From oren-py-d@hishome.net Thu Jul 25 11:01:58 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Thu, 25 Jul 2002 06:01:58 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <011201c233af$4883dd50$e000a8c0@thomasnotebook> References: <011201c233af$4883dd50$e000a8c0@thomasnotebook> Message-ID: <20020725100157.GA34465@hishome.net> On Thu, Jul 25, 2002 at 09:45:44AM +0200, Thomas Heller wrote: > > >> Can you index these byte arrays by longs? > > > > > You could index it via a long, but using a LONG_LONG is safer. My > > > understanding is that on Win64 a long will only be 32 bits even though > > > void* is 64 bits. > > > > Right. > > So isn't the conclusion that sizeof(size_t) == sizeof(void *) on > any platform, and so the index should be of type size_t instead of > int, long, or LONG_LONG (aka __int64 in some places)? The obvious type to index byte arrays would be ptrdiff_t. If (char*)-(char*)==ptrdiff_t then (char*)+ptrdiff_t==(char*) Oren From tim@zope.com Thu Jul 25 16:23:05 2002 From: tim@zope.com (Tim Peters) Date: Thu, 25 Jul 2002 11:23:05 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <011201c233af$4883dd50$e000a8c0@thomasnotebook> Message-ID: [Thomas Heller] > So isn't the conclusion that sizeof(size_t) == sizeof(void *) on > any platform, Last I knew, there were dozens of platforms besides Linux and Windows . Like I said, no relationship is defined here. C99 standardizes a uintptr_t typedef for an unsigned integer type with "enough bits" so that (void*)(uintptr_t)p == p for any legit pointer p of type void*, but only standarizes its name, not its existence (a conforming implementation isn't required to supply a typedef with this name). Such a type *is* required to compile Python, though, and pyport.h defines our own Py_uintptr_t (as a synonym for the platform uintptr_t if it exists, else to the smallest integer type it can find that looks big enough, else a compile-time #error). > and so the index should be of type size_t instead of > int, long, or LONG_LONG (aka __int64 in some places)? Try to spell out exactly what it is you think this index should be capable of representing; e.g., what's your most extreme use case? From tim.one@comcast.net Thu Jul 25 16:44:22 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 25 Jul 2002 11:44:22 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020725100157.GA34465@hishome.net> Message-ID: [Oren Tirosh] > The obvious type to index byte arrays would be ptrdiff_t. > > If (char*)-(char*)==ptrdiff_t then (char*)+ptrdiff_t==(char*) Alas, the standard only says that ptrdiff_t *is* the type of the result of pointer subtraction, not that it *suffices* for that purpose; it explicitly warns that the true result of subtracting two pointers may not be respresentable in that type (in which case the behavior is undefined). In a similar way, C says the result of adding int to int *is* int, but doesn't guarantee the result type (int) is sufficent to represent the true result (and, indeed, in the int case it often isn't). It may be safer to stick with size_t, since size_t isn't as obscure (lightly used and/or misunderstood) as ptrdiff_t. From jeremy@alum.mit.edu Thu Jul 25 12:04:39 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 25 Jul 2002 07:04:39 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: References: <20020725100157.GA34465@hishome.net> Message-ID: <15679.56135.465947.542871@slothrop.zope.com> We could have an #if test on PTRDIFF_MIN and PTRDIFF_MAX and refuse to compile if they don't have reasonable values. Jeremy From yozh@mx1.ru Thu Jul 25 17:03:38 2002 From: yozh@mx1.ru (Stepan Koltsov) Date: Thu, 25 Jul 2002 20:03:38 +0400 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants Message-ID: <20020725160337.GA8999@banana.mx1.ru> --/9DWx/yDrRhgMJTb Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Hi, all. I wrote a PEP, its number is 295, it is in attachment. It should be posted somewhere to be discussed so it is here. Please, look at it and say what you think. -- mailto: Stepan Koltsov --/9DWx/yDrRhgMJTb Content-Type: text/plain; charset=koi8-r Content-Disposition: attachment; filename="pep-0295.txt" PEP: 295 Title: Interpretation of multiline string constants Version: $Revision: 1.1 $ Last-Modified: $Date: 2002/07/22 20:45:07 $ Author: yozh@mx1.ru (Stepan Koltsov) Status: Draft Type: Standards Track Created: 22-Jul-2002 Python-Version: 3.0 Post-History: Abstract This PEP describes an interpretation of multiline string constants for Python. It suggests stripping spaces after newlines and stripping a newline if it is first character after an opening quotation. Rationale This PEP proposes an interpretation of multiline string constants in Python. Currently, the value of string constant is all the text between quotations, maybe with escape sequences substituted, e.g.: def f(): """ la-la-la limona, banana """ def g(): return "This is \ string" print repr(f.__doc__) print repr(g()) prints: '\n\tla-la-la\n\tlimona, banana\n\t' 'This is \tstring' This PEP suggest two things - ignore the first character after opening quotation, if it is newline - second: ignore in string constants all spaces and tabs up to first non-whitespace character, but no more then current indentation. After applying this, previous program will print: 'la-la-la\nlimona, banana\n' 'This is string' To get this result, previous programs could be rewritten for current Python as (note, this gives the same result with new strings meaning): def f(): """\ la-la-la limona, banana """ def g(): "This is \ string" Or stripping can be done with library routines at runtime (as pydoc does), but this decreases program readability. Implementation I'll say nothing about CPython, Jython or Python.NET. In original Python, there is no info about the current indentation (in spaces) at compile time, so space and tab stripping should be done at parse time. Currently no flags can be passed to the parser in program text (like from __future__ import xxx). I suggest enabling or disabling of this feature at Python compile time depending of CPP flag Py_PARSE_MULTILINE_STRINGS. Alternatives New interpretation of string constants can be implemented with flags 'i' and 'o' to string constants, like i""" SELECT * FROM car WHERE model = 'i525' """ is in new style, o"""SELECT * FROM employee WHERE birth < 1982 """ is in old style, and """ SELECT employee.name, car.name, car.price FROM employee, car WHERE employee.salary * 36 > car.price """ is in new style after Python-x.y.z and in old style otherwise. Also this feature can be disabled if string is raw, i.e. if flag 'r' specified. Copyright This document has been placed in the Public Domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: --/9DWx/yDrRhgMJTb-- From thomas.heller@ion-tof.com Thu Jul 25 17:22:15 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 25 Jul 2002 18:22:15 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: Message-ID: <04c501c233f7$70a0a4b0$e000a8c0@thomasnotebook> From: "Tim Peters" > [Thomas Heller] > > So isn't the conclusion that sizeof(size_t) == sizeof(void *) on > > any platform, > > Last I knew, there were dozens of platforms besides Linux and Windows > . Like I said, no relationship is defined here. C99 standardizes a > uintptr_t typedef for an unsigned integer type with "enough bits" so that > > (void*)(uintptr_t)p == p > > for any legit pointer p of type void*, but only standarizes its name, not > its existence (a conforming implementation isn't required to supply a > typedef with this name). Such a type *is* required to compile Python, > though, and pyport.h defines our own Py_uintptr_t (as a synonym for the > platform uintptr_t if it exists, else to the smallest integer type it can > find that looks big enough, else a compile-time #error). > > > and so the index should be of type size_t instead of > > int, long, or LONG_LONG (aka __int64 in some places)? > > Try to spell out exactly what it is you think this index should be capable > of representing; e.g., what's your most extreme use case? > *I* have no use for this at the moment. I was just trying to understand the (let's call it) large byte-array support in Scott's proposal on 64-bit platforms, and how to program portably on 64-bit and 32-bit platforms. Assuming we have a large enough byte array unsigned char *ptr; and want to use it in C, for example get a certain byte: unsigned char *mybyte = ptr[my_index]; What should the type of my_index be? IIRC, Scott proposed LONG_LONG, but wouldn't this be a paint on 32-bit platforms? Thomas From guido@python.org Thu Jul 25 17:32:24 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 25 Jul 2002 12:32:24 -0400 Subject: [Python-Dev] Powerpoint slide for keynotes available References: <200207250004.g6P04pP20522@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <009b01c233f8$de6bade0$7f00a8c0@pacbell.net> I wrote: > If someone can donate PDF that would be great (the HTML > generated by Powerpoint sucks too much to be worth it IMO). > > http://www.python.org/doc/essays/ppt/ > > (scroll to end) I've received about 5 offers of PDF. The first one is now on the web. Mark Hadfield won the race. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Jul 25 17:41:02 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 25 Jul 2002 12:41:02 -0400 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants References: <20020725160337.GA8999@banana.mx1.ru> Message-ID: <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> > I wrote a PEP, its number is 295, it is in attachment. > It should be posted somewhere to be discussed so it is here. > Please, look at it and say what you think. This is an incompatible change. Your PEP does not address how to deal with this at all. I will be forced to reject it unless you come up with a transition strategy (in fact, I don't even want to consider your proposal unless you deal with this). > --Guido van Rossum (home page: http://www.python.org/~guido/) From xscottg@yahoo.com Thu Jul 25 18:00:03 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 25 Jul 2002 10:00:03 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <04c501c233f7$70a0a4b0$e000a8c0@thomasnotebook> Message-ID: <20020725170003.93924.qmail@web40107.mail.yahoo.com> --- Thomas Heller wrote: > > *I* have no use for this at the moment. > I was just trying to understand the (let's call it) large > byte-array support in Scott's proposal on 64-bit platforms, > and how to program portably on 64-bit and 32-bit platforms. > > Assuming we have a large enough byte array > unsigned char *ptr; > and want to use it in C, for example get a certain byte: > > unsigned char *mybyte = ptr[my_index]; > > What should the type of my_index be? IIRC, Scott proposed LONG_LONG, > but wouldn't this be a paint on 32-bit platforms? > Ok, now that I understand where you're coming from. If nobody has an objection or can point to a supported platform where it won't work, I'll switch it to size_t. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From yozh@mx1.ru Thu Jul 25 18:06:40 2002 From: yozh@mx1.ru (Stepan Koltsov) Date: Thu, 25 Jul 2002 21:06:40 +0400 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants In-Reply-To: <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> References: <20020725160337.GA8999@banana.mx1.ru> <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> Message-ID: <20020725170640.GA10350@banana.mx1.ru> On Thu, Jul 25, 2002 at 12:41:02PM -0400, Guido van Rossum wrote: > > I wrote a PEP, its number is 295, it is in attachment. > > It should be posted somewhere to be discussed so it is here. > > Please, look at it and say what you think. > > This is an incompatible change. Your PEP does not address > how to deal with this at all. I will be forced to reject it unless > you come up with a transition strategy (in fact, I don't even want > to consider your proposal unless you deal with this). For most strings this change will not change program result (for example number of spaces doesn't matter in SQL queries). For others I suggested (in section 'Alternatives') flags 'i' and 'o' for string constants. -- mailto: Stepan Koltsov From fredrik@pythonware.com Thu Jul 25 18:27:07 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 25 Jul 2002 19:27:07 +0200 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants References: <20020725160337.GA8999@banana.mx1.ru> <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> <20020725170640.GA10350@banana.mx1.ru> Message-ID: <009f01c23400$8259c200$ced241d5@hagrid> Stepan Koltsov wrote: > > This is an incompatible change. Your PEP does not address > > how to deal with this at all. I will be forced to reject it unless > > you come up with a transition strategy (in fact, I don't even want > > to consider your proposal unless you deal with this). > > For most strings this change will not change program result and how on earth do you know that? > (for example number of spaces doesn't matter in SQL queries). so why do all your examples use SQL queries? > For others I suggested (in section 'Alternatives') flags 'i' and 'o' > for string constants. if you want to interpret multiline strings in a different way, why cannot you just do like everyone else, and use a function? mystring = SQL(""" blablabla """) (as a bonus, that approach makes it trivial to embed files, images, xml structures, etc...) a big -1 from here. From guido@python.org Thu Jul 25 18:51:01 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 25 Jul 2002 13:51:01 -0400 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants References: <20020725160337.GA8999@banana.mx1.ru> <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> <20020725170640.GA10350@banana.mx1.ru> Message-ID: <004a01c23403$d9494880$7f00a8c0@pacbell.net> > > > I wrote a PEP, its number is 295, it is in attachment. > > > It should be posted somewhere to be discussed so it is here. > > > Please, look at it and say what you think. > > > > This is an incompatible change. Your PEP does not address > > how to deal with this at all. I will be forced to reject it unless > > you come up with a transition strategy (in fact, I don't even want > > to consider your proposal unless you deal with this). > > For most strings this change will not change program result (for > example number of spaces doesn't matter in SQL queries). For others > I suggested (in section 'Alternatives') flags 'i' and 'o' for string > constants. You are proposing a language change. Because of the grave consequences of such changes you have to explain why you cannot obtain the desired results with the existing language. You have completely failed to provide a motivation for your PEP so far. If you want your PEP to be considered you must provide a motivation first. --Guido van Rossum (home page: http://www.python.org/~guido/) From yozh@mx1.ru Thu Jul 25 18:55:28 2002 From: yozh@mx1.ru (Stepan Koltsov) Date: Thu, 25 Jul 2002 21:55:28 +0400 Subject: [Python-Dev] PEP 295 - Interpretation of multiline string constants In-Reply-To: <009f01c23400$8259c200$ced241d5@hagrid> References: <20020725160337.GA8999@banana.mx1.ru> <00e301c233fa$121fa6e0$7f00a8c0@pacbell.net> <20020725170640.GA10350@banana.mx1.ru> <009f01c23400$8259c200$ced241d5@hagrid> Message-ID: <20020725175528.GA11100@banana.mx1.ru> On Thu, Jul 25, 2002 at 07:27:07PM +0200, Fredrik Lundh wrote: > > > This is an incompatible change. Your PEP does not address > > > how to deal with this at all. I will be forced to reject it unless > > > you come up with a transition strategy (in fact, I don't even want > > > to consider your proposal unless you deal with this). > > > > For most strings this change will not change program result > > and how on earth do you know that? I've seen output of `grep -rwC '"""' Python/Lib/` and `egrep -rwC '= *"""' Python/Lib/`. Most strings are docstrings ;-) > > (for example number of spaces doesn't matter in SQL queries). > > so why do all your examples use SQL queries? Because I saw this defect of Python first when I wrote SQL queries. f(): q = """my query""" % vars if debug: print q # looks bad > > For others I suggested (in section 'Alternatives') flags 'i' and 'o' > > for string constants. > > if you want to interpret multiline strings in a different way, why > cannot you just do like everyone else, and use a function? > > mystring = SQL(""" > blablabla > """) Functions don't know current indentation. > (as a bonus, that approach makes it trivial to embed files, images, > xml structures, etc...) > > a big -1 from here. :-( -- mailto: Stepan Koltsov From mcherm@destiny.com Thu Jul 25 19:01:42 2002 From: mcherm@destiny.com (Michael Chermside) Date: Thu, 25 Jul 2002 14:01:42 -0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants Message-ID: <3D403D06.6010802@destiny.com> Stephan Koltsov writes: > I wrote a PEP, its number is 295, it is in attachment. [... PEP on stripping newline and preceeding spaces multi-line string literals ...] I see ___ motivations for the proposals in this PEP, and propose alternative solutions for each. NONE of my alternative solutions requires ANY modification to the Python language. -------- Motivation 1 -- Lining up line 1 of multi-line quotes: Senario: - Use of string with things "lined up" neatly >>> def someFunction(): ... aMultiLineString = """Foo X 1.0 ... Bar Y 2.5 ... Baz Z 15.0 ... Spam Q 38.9 ... """ Notice how line 1 doesn't line up neatly with lines 2-4 because of the indenting as well as the text assigning it to a variable. This is annoying, and makes it awkward to read. Solution: - Use a backslash to escape an initial newline >>> def someFunction(): ... aMultiLineString = """\ ... Foo X 1.0 ... Bar Y 2.5 ... Baz Z 15.0 ... Spam Q 38.9 ... """ Notice that now everything lines up neatly. And we don't need to modify Python at all for this to work. -------- Motivation 2 - Maintaining Indentation Senario: - Outdenting misleads the eye >>> class SomeClass: ... def visitFromWaiter(self): ... if self.seated: ... self.silverware = ['fork','spoon'] ... self.menu = """Spam ... Spam and Eggs ... Spam on Rye ... """ ... self.napkin = DirtyNapkin() Notice how the indentation makes it quite clear when we are inside a class, a method, or a flow-control statement by merely watching the left-hand margin. But this is crudely interrupted by the multi-line string. Solution: - Process the multi-line string through a function >>> class SomeClass: ... def visitFromWaiter(self): ... if self.seated: ... self.silverware = ['fork','spoon'] ... self.menu = stripIndent( """\ ... Spam ... Spam and Eggs ... Spam on Rye ... """ ) ... self.napkin = DirtyNapkin() where stripIndent() has been defined as: >>> def stripIndent( s ): ... indent = len(s) - len(s.lstrip()) ... sLines = s.split('\n') ... resultLines = [ line[indent:] for line in sLines ] ... return ''.join( resultLines ) Notice how it is now NICELY indented, at the expense of a tiny little 4-line function. Of course, there are faster and safer ways to write stripIndent() (I, personally, would use a version that checked that each line started with identical indentation and raised an exception otherwise), but this version illustrates the idea while being very, very readable. ---- In conclusion, I propose you use simpler methods available WITHIN the language for solving this problem, rather than proposing a PEP to modify the language itself. -- Michael Chermside From xscottg@yahoo.com Thu Jul 25 18:59:50 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 25 Jul 2002 10:59:50 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <014b01c233b2$5a9bf240$e000a8c0@thomasnotebook> Message-ID: <20020725175950.8766.qmail@web40103.mail.yahoo.com> --- Thomas Heller wrote: > What if we would 'fix' the buffer interface? > This gets us part of the way there, but still has shortcomings. For one I, and people more significant than me, would still need a type that implemented the bytes object behavior. Everything but efficient pickling _could_ be done with third party extensions, but ignoring pickling (which I don't want to do), then we'd still have several significant third parties reinventing the same wheel. To me at least, this feels like a battery that should be included. > Extend the PyBufferProcs structure by new fields: > > typedef size_t (*getlargereadbufferproc)(PyObject *, void **); > typedef size_t (*getlargewritebufferproc)(PyObject *, void **); > How would you designate failure/exceptions? size_t is unsigned everywhere I can find it, so it can't return a negative number on failure. I guess the void** could be filled in with NULL. > > typedef struct { > getreadbufferproc bf_getreadbuffer; > getwritebufferproc bf_getwritebuffer; > getsegcountproc bf_getsegcount; > getcharbufferproc bf_getcharbuffer; > /* new fields */ > getlargereadbufferproc bf_getlargereadbufferproc; > getlargewritebufferproc bf_getlargewritebufferproc; > } PyBufferProcs; > > > The new fields are present if the Py_TPFLAGS_HAVE_GETLARGEBUFFER flag > is set in the object's type. Py_TPFLAGS_HAVE_GETLARGEBUFFER implies > the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. > > These functions have the same semantics Scott describes: they must > only be implemented by types only return addresses which are valid as > long as the Python 'source' object is alive. > > Python strings, unicode strings, mmap objects, and maybe other types > would expose the large buffer interface, but the array type would > *not*. We could also change the name from 'large buffer interface' > to something more sensible, currently I don't have a better name. > I've been trying to keep the proposal as unintrusive as possible while still implementing the functionality needed. Adding more flags/members to PyObjects and modifying string, unicode, mmap, ... feels like a more intrusive change to me. I'm open to the idea, but I'm not ready to retract the current proposal. Then there is still the problem of needing something like a bytes object as mentioned above. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From thomas.heller@ion-tof.com Thu Jul 25 20:47:56 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 25 Jul 2002 21:47:56 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020725175950.8766.qmail@web40103.mail.yahoo.com> Message-ID: <05d501c23414$2c15c650$e000a8c0@thomasnotebook> From: "Scott Gilbert" > --- Thomas Heller wrote: > > What if we would 'fix' the buffer interface? > > > > This gets us part of the way there, but still has shortcomings. For one I, > and people more significant than me, would still need a type that > implemented the bytes object behavior. Sure, the extension of the buffer interface is only part of the picture. The bytes type is still needed as well. The extension I proposed is motivated by these thoughts: It would enable some of Python's builtin objects to expose the interface extension by supplying two trivial functions for each in the extended tp_as_buffer slot. The new functions expose a 'safe buffer interface', where there are guarantees about the lifetime of the pointer. So your bytes object can be a view of these builtin objects as well. It dismisses the segment count of the normal buffer interface. > Everything but efficient pickling > _could_ be done with third party extensions, but ignoring pickling (which I > don't want to do), then we'd still have several significant third parties > reinventing the same wheel. To me at least, this feels like a battery that > should be included. > I don't think my proposal prevents this. > > > Extend the PyBufferProcs structure by new fields: > > > > typedef size_t (*getlargereadbufferproc)(PyObject *, void **); > > typedef size_t (*getlargewritebufferproc)(PyObject *, void **); > > > > How would you designate failure/exceptions? size_t is unsigned everywhere > I can find it, so it can't return a negative number on failure. I guess > the void** could be filled in with NULL. > Details, not yet fleshed out completely. Store NULL in the void **, use ptrdiff_t instead of size_t, or something else. Or return ((size_t)-1) on failure. Or return -1 on failure, and fill out an size_t pointer: typedef int (*getlargereadwritebufferproc(PyObject *, size_t *, void **); > > Python strings, unicode strings, mmap objects, and maybe other types > > would expose the large buffer interface, but the array type would > > *not*. We could also change the name from 'large buffer interface' > > to something more sensible, currently I don't have a better name. Maybe it should be renamed 'safe buffer interface extension' instead of 'large buffer interface' (it could be large as well)? > > I've been trying to keep the proposal as unintrusive as possible while > still implementing the functionality needed. Adding more flags/members to > PyObjects and modifying string, unicode, mmap, ... feels like a more > intrusive change to me. I'm open to the idea, but I'm not ready to retract > the current proposal. Then there is still the problem of needing something > like a bytes object as mentioned above. The advantage (IMO) is that it defines a new protocol to get the pointer to the internal byte array on objects instead of requiring that these objects are instances of a special type or subtype thereof. > > __________________________________________________ > Do You Yahoo!? No, I google. ;-) Thomas From tim.one@comcast.net Thu Jul 25 20:51:44 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 25 Jul 2002 15:51:44 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020725175950.8766.qmail@web40103.mail.yahoo.com> Message-ID: [Scott Gilbert] > ... > How would you designate failure/exceptions? size_t is unsigned everywhere > I can find it, Right, and the std requires that size_t resolve to an unsigned type, so that's reliable. > so it can't return a negative number on failure. The usual dodge is to return (and test against) (size_t)-1 in that case. If the caller sees that the result is (size_t)-1, then it also needs to call PyErr_Occurred() to see whether it's a normal, or error, return value (and if it is an error case, the routine had to have set a Python exception, so that PyErr_Occurred() returns true then). > I guess the void** could be filled in with NULL. Sounds easier to me . From tdelaney@avaya.com Thu Jul 25 23:27:23 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Fri, 26 Jul 2002 08:27:23 +1000 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants Message-ID: > From: Michael Chermside [mailto:mcherm@destiny.com] > > In conclusion, I propose you use simpler methods available WITHIN the > language for solving this problem, rather than proposing a > PEP to modify > the language itself. In fact, the simplest mechanism is to declare all multi-line string literals at module scope. Presumably all such literals are supposed to be constants (docstrings are a special exception, but there are already rules for those in terms of how they should be displayed). This is a highly incompatible change with very high risk of breaking code. This is not a -1 or some such - this is a "cannot even be considered unless you can make it backwards compatible with all uses of multiline strings" which is of course impossible (since the whole purpose of the PEP is to modify such strings). When I first read this PEP I thought it was something that had been suggested to someone, and it was being proposed in order to be rejeted. It's obvious from later posts that that is not the case, and Stepan is having trouble understanding why such a PEP would be rejected out of hand. You might find support for a library function which performed the transformation that you desire (if there's a good enough use case for it). Personally, I don't think there is - too many times that one particular transformation will be "almost, but not quite what I want" in which case I need to roll my own anyway. Tim Delaney From ping@zesty.ca Thu Jul 25 22:03:51 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Thu, 25 Jul 2002 14:03:51 -0700 (PDT) Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: On Wed, 24 Jul 2002, Tim Peters wrote: > In short, there's no real "speed argument" against this anymore (as I said > in the first msg of this thread, the ~sort regression was serious -- it's an > important case; turns out galloping is very effective at speeding it too, > provided that dumbass premature special-casing doesn't stop galloping from > trying ). This is fantastic work, Tim. I'm all for switching over to timsort as the one standard sort method. -- ?!ng "Most things are, in fact, slippery slopes. And if you start backing off from one thing because it's a slippery slope, who knows where you'll stop?" -- Sean M. Burke From python-dev@zesty.ca Thu Jul 25 23:33:38 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Thu, 25 Jul 2002 15:33:38 -0700 (PDT) Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants In-Reply-To: Message-ID: On Fri, 26 Jul 2002, Delaney, Timothy wrote: > You might find support for a library function which performed the > transformation that you desire (if there's a good enough use case for it). inspect.getdoc(object) provides this, for docstrings. There's no function in the library to do this in general to any string, though. -- ?!ng "Mathematics isn't about what's true. It's about what can be concluded from what." From tim.one@comcast.net Fri Jul 26 02:05:54 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 25 Jul 2002 21:05:54 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: [Tim] > ... > There's also a significant systematic regression in timsort's +sort case, > ... also a mix of small regressions and speedups in 3sort. > These are because, to simplify experimenting, ...(and as many as > N-1 temp pointers can be needed, up from N/2). That's all repairable, > it's just a PITA to do it. It's repaired, and those glitches went away: > timsort > i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort > 15 32768 0.17 0.01 0.02 0.01 0.01 0.05 0.01 0.02 > 16 65536 0.24 0.02 0.02 0.02 0.02 0.09 0.02 0.04 > 17 131072 0.54 0.05 0.04 0.05 0.05 0.19 0.04 0.09 > 18 262144 1.17 0.09 0.09 0.10 0.10 0.38 0.09 0.18 > 19 524288 2.56 0.18 0.17 0.20 0.20 0.79 0.17 0.36 > 20 1048576 5.54 0.37 0.35 0.37 0.41 1.62 0.35 0.73 Now at 15 32768 0.17 0.01 0.01 0.01 0.02 0.09 0.01 0.03 16 65536 0.24 0.02 0.02 0.02 0.02 0.09 0.02 0.04 17 131072 0.53 0.05 0.04 0.05 0.05 0.18 0.04 0.09 18 262144 1.17 0.09 0.09 0.10 0.09 0.38 0.09 0.18 19 524288 2.56 0.18 0.18 0.19 0.19 0.78 0.17 0.36 20 1048576 5.53 0.37 0.35 0.36 0.37 1.60 0.35 0.74 In other news, an elf revealed that Perl is moving to an adaptive stable mergesort(!!!harmonic convergence!!!), and sent some cleaned-up source code. The comments reference a non-existent paper, but if I change the title and the year I find it here: Optimistic sorting and information theoretic complexity. Peter McIlroy. SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp 467-474, Austin, Texas, 25-27 January 1993. Jeremy got that for me, and it's an extremely relevant paper. What I've been calling galloping he called "exponential search", and the paper has some great analysis, pretty much thoroughly characterizing the set of permutations for which this kind approach is helpful, and even optimal. It's a large set . Amazingly, citeseer finds only one reference to this paper, also from 1993, and despite all the work done on adaptive sorting since then. So it's either essentially unknown in the research community, was shot full of holes (but then people would have delighted in citing it just to rub that in <0.5 wink>), or was quickly superceded by a better result (but then ditto!). I'll leave that a mystery. I haven't had time yet to study the Perl code. The timsort algorithm is clearly more frugal with memory: worst-case N/2 temp pointers needed, and, e.g., in +sort it only needs (at most) 10 temp pointers (independent of N). That may or may not be good, though, depending on whether the Perl algorithm makes more effective use of the memory hierarchy; offhand I don't think it does. OTOH, timsort has 4 flavors of galloping and 2 flavors of binary search and 2 merge routines, because the memory-saving gimmick can require merging "from the left" or "from the right", depending on which run is smaller. Doubling the number of helper routines is what "PITA" meant in the quote at the start . One more bit of news: cross-box performance of this stuff is baffling. Nobody else has tried timsort yet (unless someone who asked for the code tried an earlier version), but there are Many Mysteries just looking at the numbers for /sort under current CVS Python. Recall that /sort is the case where the data is already sorted: it does N-1 compares in one scan, and that's all. For an array with 2**20 distinct floats that takes 0.35 seconds on my Win98SE 866MHz Pentium box, compiled w/ MSVC6. On my Win2K 866MHz Pentium box, compiled w/ MSVC6, it takes 0.58(!) seconds, and indeed all the sort tests take incredibly much longer on the Win2K box. On Fred's faster Pentium box (I forget exactly how fast, >900MHz and <1GHz), using gcc, the sort tests take a lot less time than on my Win2K box, but my Win98SE box is still faster. Another Mystery (still with the current samplesort): on Win98SE, !sort is always a bit faster than *sort. On Win2K and on Fred's box, it's always a bit slower. I'm leaving that a mystery too. I haven't tried timsort on another box yet, and given that my home machine may be supernaturally fast, I'm never going to . From xscottg@yahoo.com Fri Jul 26 02:33:30 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 25 Jul 2002 18:33:30 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <05d501c23414$2c15c650$e000a8c0@thomasnotebook> Message-ID: <20020726013330.31053.qmail@web40111.mail.yahoo.com> --- Thomas Heller wrote: > From: "Scott Gilbert" > > --- Thomas Heller wrote: > > > What if we would 'fix' the buffer interface? > > > > > For one I, and people more significant than me, would still need a > > type that implemented the bytes object behavior. > > Sure, the extension of the buffer interface is only part of the > picture. The bytes type is still needed as well. > > The extension I proposed is motivated by these thoughts: > > It would enable some of Python's builtin objects to > expose the interface extension by supplying two > trivial functions for each in the extended tp_as_buffer slot. > > The new functions expose a 'safe buffer interface', where > there are guarantees about the lifetime of the pointer. So > your bytes object can be a view of these builtin objects > as well. > > It dismisses the segment count of the normal buffer interface. > [...] > > > > I've been trying to keep the proposal as unintrusive as possible while > > still implementing the functionality needed. Adding more flags/members > > to PyObjects and modifying string, unicode, mmap, ... feels like a more > > intrusive change to me. I'm open to the idea, but I'm not ready to > > retract the current proposal. Then there is still the problem of > > needing something like a bytes object as mentioned above. > > The advantage (IMO) is that it defines a new protocol to get the > pointer to the internal byte array on objects instead of > requiring that these objects are instances of a special type > or subtype thereof. > I like your idea for adding the flags and methods to create a "safe buffer interface". As you note, string, unicode, mmap, and possibly other things could implement these methods and return a (possibly large) pointer that could be manipulated after the GIL is released. Of course the pickleable bytes object falls into that category too. It seems to me that we have two independant proposals. Do you see any reason why they shouldn't be two separate PEPs? I don't see any reason to piggyback them into one. They're related in topic, but neither seems to rely on the other in any way. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From guido@python.org Fri Jul 26 04:16:36 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 25 Jul 2002 23:16:36 -0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline stringconstants References: Message-ID: <00a901c23452$daae16c0$7f00a8c0@pacbell.net> My mails to Stepan Koltsov have been bouncing (after the first one apparently went through). Assuming he's not subscribed to python-dev, he may not be aware of our responses. What to do? Simple reject it in absentia? --Guido van Rossum (home page: http://www.python.org/~guido/) From Rick Farrer" This is a multi-part message in MIME format. ------=_NextPart_000_0007_01C2342A.123F3C00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Please remove me from the mailing list. rf@avisionone.com Thanks, Rick ------=_NextPart_000_0007_01C2342A.123F3C00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printablePlease remove me from the mailing=20 list.
rf@avisionone.com
Thanks,
Rick
------=_NextPart_000_0007_01C2342A.123F3C00-- From cce@clarkevans.com Fri Jul 26 05:37:10 2002 From: cce@clarkevans.com (Clark C . Evans) Date: Fri, 26 Jul 2002 00:37:10 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <20020723063611.26677.qmail@web40102.mail.yahoo.com>; from xscottg@yahoo.com on Mon, Jul 22, 2002 at 11:36:11PM -0700 References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <20020726003709.C17944@doublegemini.com> | Abstract | | This PEP proposes the creation of a new standard type and builtin | constructor called 'bytes'. The bytes object is an efficiently | stored array of bytes with some additional characteristics that | set it apart from several implementations that are similar. This is great. Python currently lacks two "standard" programming objects which most languages have: (a) timestamp, and (b) binary. This addresses the second. This will greatly help YAML data interoperability among other programming languages such as Java, Ruby, etc. Best, Clark Yo! Check out YAML Serialization for the masses! http://yaml.org -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software From sholden@holdenweb.com Fri Jul 26 14:15:33 2002 From: sholden@holdenweb.com (Steve Holden) Date: Fri, 26 Jul 2002 09:15:33 -0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline stringconstants References: <00a901c23452$daae16c0$7f00a8c0@pacbell.net> Message-ID: <127c01c234a6$867957f0$6300000a@holdenweb.com> ----- Original Message ----- From: "Guido van Rossum" To: Sent: Thursday, July 25, 2002 11:16 PM Subject: Re: [Python-Dev] Re: PEP 295 - Interpretation of multiline stringconstants > My mails to Stepan Koltsov have been bouncing (after the first one > apparently went through). Assuming he's not subscribed to python-dev, > he may not be aware of our responses. What to do? Simple reject it > in absentia? > Well, at least that way he'll see it's been rejected from the PEP listing. You can always direct him to the Mailman archives when his mail comes back on line. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From mal@lemburg.com Fri Jul 26 08:35:07 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jul 2002 09:35:07 +0200 Subject: [Python-Dev] Sorting References: Message-ID: <3D40FBAB.8090909@lemburg.com> Tim Peters wrote: > One more bit of news: cross-box performance of this stuff is baffling. > Nobody else has tried timsort yet (unless someone who asked for the code > tried an earlier version), but there are Many Mysteries just looking at the > numbers for /sort under current CVS Python. Recall that /sort is the case > where the data is already sorted: it does N-1 compares in one scan, and > that's all. For an array with 2**20 distinct floats that takes 0.35 seconds > on my Win98SE 866MHz Pentium box, compiled w/ MSVC6. On my Win2K 866MHz > Pentium box, compiled w/ MSVC6, it takes 0.58(!) seconds, and indeed all the > sort tests take incredibly much longer on the Win2K box. On Fred's faster > Pentium box (I forget exactly how fast, >900MHz and <1GHz), using gcc, the > sort tests take a lot less time than on my Win2K box, but my Win98SE box is > still faster. > > Another Mystery (still with the current samplesort): on Win98SE, !sort is > always a bit faster than *sort. On Win2K and on Fred's box, it's always a > bit slower. I'm leaving that a mystery too. I haven't tried timsort on > another box yet, and given that my home machine may be supernaturally fast, > I'm never going to . I can give it a go on my AMD boxes if you send me the code. They tend to show surprising results as you know :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From thomas.heller@ion-tof.com Fri Jul 26 15:28:50 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 26 Jul 2002 16:28:50 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface Message-ID: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> Here is the draft PEP for the ideas posted here. Regards, Thomas -------- PEP: xxx Title: The Safe Buffer Interface Version: $Revision: $ Last-Modified: $Date: 2002/07/26 14:19:38 $ Author: theller@python.net (Thomas Heller) Status: Draft Type: Standards Track Created: 26-Jul-2002 Python-Version: 2.3 Post-History: 26-Jul-2002 Abstract This PEP proposes an extension to the buffer interface called the 'safe buffer interface'. The safe buffer interface fixes the flaws of the 'old' buffer interface as defined in Python versions up to and including 2.2: The lifetime of the retrieved pointer is clearly defined. The buffer size is returned as a 'size_t' data type, which allows access to 'large' buffers on platforms where sizeof(int) != sizeof(void *). Specification The 'safe' buffer interface exposes new functions which return the size and the pointer to the internal memory block of any python object which chooses to implement this interface. The size and pointer returned must be valid as long as the object is alive (has a positive reference count). So, only objects which never reallocate or resize the memory block are allowed to implement this interface. The safe buffer interface ommits the memory segment model which is present in the old buffer interface - only a single memory block can be exposed. Implementation Define a new flag in Include/object.h: #define Py_TPFLAGS_HAVE_GETSAFEBUFFER /* PyBufferProcs contains bf_getsafereadbuffer and bf_getsafewritebuffer */ #define Py_TPFLAGS_HAVE_GETSAFEBUFFER (1L<<15) This flag would be included in Py_TPFLAGS_DEFAULT: #define Py_TPFLAGS_DEFAULT ( \ .... Py_TPFLAGS_HAVE_GETCHARBUFFER | \ .... 0) Extend the PyBufferProcs structure by new fields in Include/object.h: typedef size_t (*getlargereadbufferproc)(PyObject *, void **); typedef size_t (*getlargewritebufferproc)(PyObject *, void **); typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* safe buffer interface functions */ getsafereadbufferproc bf_getsafereadbufferproc; getsafewritebufferproc bf_getsafewritebufferproc; } PyBufferProcs; The new fields are present if the Py_TPFLAGS_HAVE_GETLARGEBUFFER flag is set in the object's type. XXX Py_TPFLAGS_HAVE_GETLARGEBUFFER implies the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. The getsafereadbufferproc and getsafewritebufferproc functions return the size in bytes of the memory block on success, and fill in the passed void * pointer on success. If these functions fail - either because an error occurs or no memory block is exposed - they must set the void * pointer to NULL and raise an exception. The return value is undefined in these cases and should not be used. Backward Compatibility There are no backward compatibility problems. Reference Implementation Will be uploaded to the sourceforge patch manager by the author. Additional Notes/Comments It may be a good idea to expose the following convenience functions: int PyObject_AsSafeReadBuffer(PyObject *obj, void **buffer, size_t *buffer_len); int PyObject_AsSafeWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len); These functions return 0 on success, set buffer to the memory location and buffer_len to the length of the memory block in bytes. On failure, they return -1 and set an exception. Python strings, unicode strings, mmap objects, and maybe other types would expose the safe buffer interface, but the array type would *not*, because it's memory block may be reallocated during it's lifetime. References [1] The buffer interface http://mail.python.org/pipermail/python-dev/2000-October/009974.html [2] The Buffer Problem http://www.python.org/peps/pep-0296.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From ville.vainio@swisslog.com Fri Jul 26 08:11:41 2002 From: ville.vainio@swisslog.com (Ville Vainio) Date: Fri, 26 Jul 2002 10:11:41 +0300 Subject: [Python-Dev] Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> Message-ID: <3D40F62D.7000106@swisslog.com> > where stripIndent() has been defined as: > > >>> def stripIndent( s ): > ... indent = len(s) - len(s.lstrip()) > ... sLines = s.split('\n') > ... resultLines = [ line[indent:] for line in sLines ] > ... return ''.join( resultLines ) Something like this should really be available somewhere in the standard library (string module [yeah, predeprecation, I know], string method). Everybody needs this kind of functionality, and probably more often than many of the other string methods (title, swapcase come to mind). -- Ville From xscottg@yahoo.com Fri Jul 26 16:01:09 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Fri, 26 Jul 2002 08:01:09 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> Message-ID: <20020726150109.4104.qmail@web40111.mail.yahoo.com> --- Thomas Heller wrote: > Here is the draft PEP for the ideas posted here. > [...] I like it. :-) > > typedef size_t (*getlargereadbufferproc)(PyObject *, void **); > typedef size_t (*getlargewritebufferproc)(PyObject *, void **); > I'm sure this is a cut-and-pasto for typedef size_t (*getsafereadbufferproc)(PyObject *, void **); typedef size_t (*getsafewritebufferproc)(PyObject *, void **); __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From thomas.heller@ion-tof.com Fri Jul 26 16:06:55 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 26 Jul 2002 17:06:55 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020726150109.4104.qmail@web40111.mail.yahoo.com> Message-ID: <089d01c234b6$15385220$e000a8c0@thomasnotebook> From: "Scott Gilbert" > > Here is the draft PEP for the ideas posted here. > > > [...] > > I like it. :-) :-) > > > typedef size_t (*getlargereadbufferproc)(PyObject *, void **); > > typedef size_t (*getlargewritebufferproc)(PyObject *, void **); > > I'm sure this is a cut-and-pasto for > > typedef size_t (*getsafereadbufferproc)(PyObject *, void **); > typedef size_t (*getsafewritebufferproc)(PyObject *, void **); > Exactly. Everything is named safebuffer instead of largebuffer. Thanks, Thomas From mwh@python.net Fri Jul 26 10:44:45 2002 From: mwh@python.net (Michael Hudson) Date: 26 Jul 2002 10:44:45 +0100 Subject: [Python-Dev] Sorting In-Reply-To: Tim Peters's message of "Thu, 25 Jul 2002 21:05:54 -0400" References: Message-ID: <2meldq3jsi.fsf@starship.python.net> Tim Peters writes: > One more bit of news: cross-box performance of this stuff is baffling. > Nobody else has tried timsort yet (unless someone who asked for the code > tried an earlier version), but there are Many Mysteries just looking at the > numbers for /sort under current CVS Python. If you put the code somewhere, I'll try it on my PPC iBook (not today, as it's at home, but soon). I'd thank you for working on this, but you're clearly enjoying it an unhealthy amount already . Cheers, M. -- ZAPHOD: You know what I'm thinking? FORD: No. ZAPHOD: Neither do I. Frightening isn't it? -- The Hitch-Hikers Guide to the Galaxy, Episode 11 From jacobs@penguin.theopalgroup.com Fri Jul 26 16:18:58 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Fri, 26 Jul 2002 11:18:58 -0400 (EDT) Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: On Thu, 25 Jul 2002, Tim Peters wrote: > One more bit of news: cross-box performance of this stuff is baffling. I'll run tests on the P4 Xeon, Alpha (21164A, 21264), AMD Elan 520, and maybe a few Sparcs, and whatever else I can get my hands on. Just let me know where I can snag the code + test script. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From pinard@iro.umontreal.ca Fri Jul 26 16:05:39 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Jul 2002 11:05:39 -0400 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? In-Reply-To: <3D40F62D.7000106@swisslog.com> References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> Message-ID: [Ville Vainio] > > where stripIndent() has been defined as: > > > > >>> def stripIndent( s ): > > ... indent = len(s) - len(s.lstrip()) > > ... sLines = s.split('\n') > > ... resultLines = [ line[indent:] for line in sLines ] > > ... return ''.join( resultLines ) > Something like this should really be available somewhere in the standard > library (string module [yeah, predeprecation, I know], string > method). Everybody needs this kind of functionality, and probably more often > than many of the other string methods (title, swapcase come to mind). Strange. I did a lot of Python programming, and never needed this. In fact, I like my doc-strings and other triple-quoted strings flushed left. So, I can see them in the code exactly as they will appear on the screen. If I used artificial margins in Python so my doc-strings appeared to be indented more than the surrounding, and wrote my code this way, it would appear artificially constricted on the left once printed. It's not worth. For me, best is to use """\ always while the opening triple-quote, and write flushed left until the closing """. As most long strings end with a new line, the closing """ is usually flushed left just as well. My opinion is that it is nice this way. Don't touch the thing! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From oren-py-d@hishome.net Fri Jul 26 09:15:07 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 26 Jul 2002 11:15:07 +0300 Subject: [Python-Dev] Iteration - my summary Message-ID: <20020726111507.A28836@hishome.net> There has been some lively discussion about the iteration protocols lately. My impression of the opinions on the list so far is this: It could have been semantically cleaner. There is a blurred boundary between the iterable-container and iterator protocols. Perhaps next should have been called __next__. Perhaps iterators should not have been required to implement an __iter__ method returning self. With the benefit of hindsight the protocols could have been designed better. But there is nothing fundamentally broken about iteration. Nothing that justifies any serious change that would break backward compatibility and require a transition plan. A remaining sore spot is re-iterability. Iterators being their own iterators is ok by itself. StopIteration being a sink state is ok by itself. When they are combined they result in hard-to-trace silent errors because an exhausted iterator is indistinguishable from an empty container. This happens in real code, not in some contrived examples. It is clear to me that this issue needs to be addressed in some way, but without a complete redesign of the iteration protocols. My proposal of raising an exception on calling .next() after StopIteration has been rejected by Guido. Here's another approach: Proposal: new built-in function reiter() def reiter(obj): """reiter(obj) -> iterator Get an iterator from an object. If the object is already an iterator a TypeError exception will be raised. For all Python built-in types it is guaranteed that if this function succeeds the next call to reiter() will return a new iterator that produces the same items unless the object is modified. Non-builtin iterable objects which are not iterators SHOULD support multiple iteration returning the same items.""" it = iter(obj) if it is obj: raise TypeError('Object is not re-iterable') return it Example: def cartprod(a,b): """ Generate the cartesian product of two sources. """ for x in a: for y in reiter(b): yield x,y This function should raise an exception if object b is a generator or some other non re-iterable object. List comprehensions should use the C API equivalent of reiter for sources other than the first. This solution is less than perfect. It requires explicit attention by the programmer and is less comprehensive than the other solutions proposed but I think it's better than nothing. A related issue is iteration of files. It's an exception for the guarantee made in the docstring above. My impression is that people generally agree that file objects are more iterator-like than container-like because they are stateful cursors. However, making files into iterators is not as simple as adding a next method that calls readline and raises StopIteration on EOF. This implementation would lose the performance benefit from the readahead bufering done in the xreadlines object. The way I see file object iteration is that the file object and xreadlines object abuse the iterable-container<->iterator relationship to produce a cursor-without-readahead-buffer<->cursor-with-readahead-buffer relationship. I don't like objects pretending to be something they're not. I can finish my xreadlines caching patch that makes a file into an iterator with an embedded xreadlines object. Perhaps it's not the most elegant solution but I don't see any real problems with it. I am also thinking about implementing line buffering inside the file object that can finally get rid of the whole fgets/getc_unlocked multiplatform mess and make xreadlines unnecessary. The problem here is that readahead is not exactly a transparent operation. More on this later. Oren From yozh@mx1.ru Fri Jul 26 17:05:59 2002 From: yozh@mx1.ru (Stepan Koltsov) Date: Fri, 26 Jul 2002 20:05:59 +0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants In-Reply-To: <00a901c23452$daae16c0$7f00a8c0@pacbell.net> References: <00a901c23452$daae16c0$7f00a8c0@pacbell.net> Message-ID: <20020726160559.GA24120@banana.mx1.ru> On Thu, Jul 25, 2002 at 11:16:36PM -0400, Guido van Rossum wrote: > My mails to Stepan Koltsov have been bouncing (after the first one > apparently went through). Assuming he's not subscribed to python-dev, > he may not be aware of our responses. What to do? Simple reject it > in absentia? I don't understand, what happens with my DNS, but I am subscriber of this maillist and I read it sometimes. So... What you (and others) think about just adding flag 'i' to string constants (that will strip indentation etc.)? This doesn't affect existing code, but it will be useful (at least for me ;-) Motivation was posted here by Michael Chermside, but I don't like his solutions. -- mailto: Stepan Koltsov From tim.one@comcast.net Fri Jul 26 17:02:34 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 26 Jul 2002 12:02:34 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: Apart from fine-tuning and rewriting the doc file, I think the mergesort is done. I'm confident that if any bugs remain, I haven't seen them . A patch against current CVS listobject.c is here: http://www.python.org/sf/587076 Simple instructions for timing exactly the same data I've posted times against are in the patch description (you already have sortperf.py -- it's in Lib/test). This patch doesn't replace samplesort, it adds a new .msort() method, to make comparative timings easier. It also adds an .hsort() method for weak heapsort, because I forgot to delete that code after I gave up on it . X-platform samplesort timings are interesting as well as samplesort versus mergesort timings. Timings against "real life" sort jobs are especially interesting. Attaching results to the bug report sounds like a good idea to me, so we get a coherent record in one place. Thanks in advance! From mal@lemburg.com Fri Jul 26 17:58:49 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jul 2002 18:58:49 +0200 Subject: [Python-Dev] Sorting References: Message-ID: <3D417FC9.6030308@lemburg.com> Tim Peters wrote: > Apart from fine-tuning and rewriting the doc file, I think the mergesort is > done. I'm confident that if any bugs remain, I haven't seen them . A > patch against current CVS listobject.c is here: > > http://www.python.org/sf/587076 > > Simple instructions for timing exactly the same data I've posted times > against are in the patch description (you already have sortperf.py -- it's > in Lib/test). This patch doesn't replace samplesort, it adds a new .msort() > method, to make comparative timings easier. It also adds an .hsort() method > for weak heapsort, because I forgot to delete that code after I gave up on > it . > > X-platform samplesort timings are interesting as well as samplesort versus > mergesort timings. Timings against "real life" sort jobs are especially > interesting. Attaching results to the bug report sounds like a good idea to > me, so we get a coherent record in one place. > > Thanks in advance! Here's the result for AMD Athlon 1.2GHz/Linux/gcc: Python/Tim-Python> ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 without patch: Python/Tim-Python> ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From tim.one@comcast.net Fri Jul 26 18:22:04 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 26 Jul 2002 13:22:04 -0400 Subject: [Python-Dev] Sorting In-Reply-To: <3D417FC9.6030308@lemburg.com> Message-ID: [MAL] > Here's the result for AMD Athlon 1.2GHz/Linux/gcc: > > Python/Tim-Python> ./python -O Lib/test/sortperf.py 15 20 1 > i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort > 15 32768 0.07 0.00 0.01 0.09 0.01 0.03 0.01 0.08 > 16 65536 0.18 0.02 0.02 0.19 0.03 0.07 0.02 0.20 > 17 131072 0.43 0.05 0.04 0.46 0.05 0.18 0.05 0.48 > 18 262144 0.99 0.09 0.10 1.04 0.13 0.40 0.09 1.11 > 19 524288 2.23 0.19 0.21 2.32 0.24 0.83 0.20 2.46 > 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72 0.40 5.46 > > without patch: > > Python/Tim-Python> ./python -O Lib/test/sortperf.py 15 20 1 > i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort > 15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 > 16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 > 17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 > 18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 > 19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 > 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 I assume you didn't read the instructions in the patch description: http://www.python.org/sf/587076 The patch doesn't change anything about how list.sort() works, so what you've shown us is the timing variance on your box across two identical runs. To time the new routine, you need to (temporarily) change L.sort() to L.msort() in sortperf.py's doit() function. It's a one-character change, but an important one . From tim.one@comcast.net Fri Jul 26 18:50:30 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 26 Jul 2002 13:50:30 -0400 Subject: [Python-Dev] Sorting In-Reply-To: <3D418884.2090509@lemburg.com> Message-ID: [MAL] > Dang. Why don't you distribute a ZIP file which can be dumped > onto the standard Python installation ? A zip file containing what? And which "standard Python installation"? If someone is on Python-Dev but can't deal with a one-file patch against CVS, I'm not sure what to conclude, except that I don't want to deal with them at this point . > Here's the .msort() version: > > Python/Tim-Python> ./python -O sortperf.py 15 20 1 > i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort > 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 > 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 > 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 > 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 > 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 > 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 Thanks! That's more like it. So far I've got the only known box were ~sort is slower under msort (two other sets of timings were attached to the patch; I'll paste yours in too, merging in the smaller numbers from your first report). From mal@lemburg.com Fri Jul 26 18:36:04 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 26 Jul 2002 19:36:04 +0200 Subject: [Python-Dev] Sorting References: Message-ID: <3D418884.2090509@lemburg.com> Tim Peters wrote: > [MAL] > >>Here's the result for AMD Athlon 1.2GHz/Linux/gcc: >> >>without patch: >> >>Python/Tim-Python> ./python -O Lib/test/sortperf.py 15 20 1 >> i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort >>15 32768 0.08 0.01 0.01 0.09 0.01 0.03 0.00 0.09 >>16 65536 0.20 0.02 0.01 0.20 0.03 0.07 0.02 0.20 >>17 131072 0.46 0.06 0.02 0.45 0.05 0.20 0.04 0.49 >>18 262144 0.99 0.09 0.10 1.09 0.11 0.40 0.12 1.12 >>19 524288 2.33 0.20 0.20 2.30 0.24 0.83 0.19 2.47 >>20 1048576 4.89 0.40 0.41 5.37 0.48 1.71 0.38 6.22 > > > I assume you didn't read the instructions in the patch description: > > http://www.python.org/sf/587076 > > The patch doesn't change anything about how list.sort() works, so what > you've shown us is the timing variance on your box across two identical > runs. To time the new routine, you need to (temporarily) change L.sort() to > L.msort() in sortperf.py's doit() function. It's a one-character change, > but an important one . Dang. Why don't you distribute a ZIP file which can be dumped onto the standard Python installation ? Here's the .msort() version: Python/Tim-Python> ./python -O sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort ~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03 0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07 0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16 0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33 0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66 0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37 0.41 0.84 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From thomas.heller@ion-tof.com Fri Jul 26 19:17:10 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 26 Jul 2002 20:17:10 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <0a7101c234d0$a8c463c0$e000a8c0@thomasnotebook> [sorry if you see this twice, didn't seem to get through the first time] If the safe buffer PEP would be accepted and implemented, here's my proposal for the bytes object. The bytes object uses the safe buffer interface to gain access to the byte array it exposes. The bytes type would probably accept the following arguments: PyObject *type - the (bytes) type or subtype to create PyObject *obj - the object exposing the safe buffer interface size_t offset - starting offset of obj's memory block size_t length - number of bytes to use (0 for all) and maybe a flag requesting read or read/write access. A convention could be that if a NULL is passed for obj, then the bytes object itself allocates a memory block of length length. Of course the bytes object itself would also expose the safe buffer interface. And slicing, but not repetition. Isn't the above sufficient (provided that we somehow add the pickle stuff into this picture)? Thomas From thomas.heller@ion-tof.com Fri Jul 26 18:46:33 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 26 Jul 2002 19:46:33 +0200 Subject: [Python-Dev] PEP 296 - The Buffer Problem References: <20020723063611.26677.qmail@web40102.mail.yahoo.com> Message-ID: <098c01c234cc$621a78f0$e000a8c0@thomasnotebook> If the safe buffer PEP would be accepted and implemented, here's my proposal for the bytes object. The bytes object uses the safe buffer interface to gain access to the byte array it exposes. The bytes type would probably accept the following arguments: PyObject *type - the (bytes) type or subtype to create PyObject *obj - the object exposing the safe buffer interface size_t offset - starting offset of obj's memory block size_t length - number of bytes to use (0 for all) and maybe a flag requesting read or read/write access. A convention could be that if a NULL is passed for obj, then the bytes object itself allocates a memory block of length length. Of course the bytes object itself would also expose the safe buffer interface. And slicing, but not repetition. Isn't the above sufficient (provided that we somehow add the pickle stuff into this picture)? Thomas From guido@python.org Fri Jul 26 21:48:30 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 26 Jul 2002 16:48:30 -0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants In-Reply-To: Your message of "Fri, 26 Jul 2002 20:05:59 +0400." <20020726160559.GA24120@banana.mx1.ru> References: <00a901c23452$daae16c0$7f00a8c0@pacbell.net> <20020726160559.GA24120@banana.mx1.ru> Message-ID: <200207262048.g6QKmU123924@pcp02138704pcs.reston01.va.comcast.net> > So... What you (and others) think about just adding flag 'i' to string > constants (that will strip indentation etc.)? This doesn't affect > existing code, but it will be useful (at least for me ;-) Motivation > was posted here by Michael Chermside, but I don't like his solutions. And I don't like your proposal. Sorry, but I really don't think the syntax should be changed for something that's so trivial to code if you need it. --Guido van Rossum (home page: http://www.python.org/~guido/) From nhodgson@bigpond.net.au Sat Jul 27 01:51:39 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Sat, 27 Jul 2002 10:51:39 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> Message-ID: <028501c23507$c46ebcb0$3da48490@neil> Thomas Heller: > The size and pointer returned must be valid as long as the object > is alive (has a positive reference count). So, only objects which > never reallocate or resize the memory block are allowed to > implement this interface. I'd prefer an interface that allows for reallocation but has an explicit locked state during which the buffer must stay still. My motivation comes from the data structures implemented in Scintilla (an editor component), which could be exposed through this buffer interface to other code. The most important type in Scintilla (as in many editors) is a split (or gapped) buffer. Upon receiving a lock call, it could collapse the gap and return a stable pointer to its contents and then revert to its normal behaviour on receiving an unlock. Neil From xscottg@yahoo.com Sat Jul 27 03:26:38 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Fri, 26 Jul 2002 19:26:38 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <098c01c234cc$621a78f0$e000a8c0@thomasnotebook> Message-ID: <20020727022638.86727.qmail@web40101.mail.yahoo.com> --- Thomas Heller wrote: > If the safe buffer PEP would be accepted and implemented, > here's my proposal for the bytes object. > > The bytes object uses the safe buffer interface to gain > access to the byte array it exposes. > > The bytes type would probably accept the following arguments: > > PyObject *type - the (bytes) type or subtype to create > PyObject *obj - the object exposing the safe buffer interface > size_t offset - starting offset of obj's memory block > size_t length - number of bytes to use (0 for all) > > and maybe a flag requesting read or read/write access. > > A convention could be that if a NULL is passed for obj, > then the bytes object itself allocates a memory block > of length length. > > Of course the bytes object itself would also expose the safe > buffer interface. And slicing, but not repetition. > > Isn't the above sufficient (provided that we somehow > add the pickle stuff into this picture)? > It's probably sufficient but more than necessary. In particular, supporting the safe buffer protocol makes sense to me (if that gets accepted), but I'm not eager to immediately support the obj pointer as you describe above. We've gotten side-tracked a bit when describing the "view behavior" for the slicing operations on a bytes object. It was not my intent that the bytes object typically be used to create views into other Python objects. That whole discussion was an attempt to describe the slicing behavior. From my perspective, describing the whole inner-thing and outer-thing stuff was to explain the implementation. Think of the bytes object as a mutable string with some additional restrictions, and that's what I have in mind. The mmap example is sort of a retrofit since mmap should probably have been implemented via something like bytes in the first place (to get the bytes style slicing among other things), not because I think there are a lot of objects that you would want to wrap up in bytes views. The existing buffer object is ok for creating views, and truthfully I don't know how often it is really used for that. What I (and I think others) need is more like a pickleable-mutable-reliable-byte-string. I'm not eager to grow bytes into a superset object. Even if I'm wrong about the need for this, at the very least, the additional functionality can be added later. I really just want to push through a simple, usable, bytes object for the time being. We can easily add, we can't easily take away. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Sat Jul 27 03:40:12 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Fri, 26 Jul 2002 19:40:12 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <028501c23507$c46ebcb0$3da48490@neil> Message-ID: <20020727024012.85905.qmail@web40107.mail.yahoo.com> --- Neil Hodgson wrote: > Thomas Heller: > > > The size and pointer returned must be valid as long as the object > > is alive (has a positive reference count). So, only objects which > > never reallocate or resize the memory block are allowed to > > implement this interface. > > I'd prefer an interface that allows for reallocation but has an explicit > locked state during which the buffer must stay still. My motivation comes > from the data structures implemented in Scintilla (an editor component), > which could be exposed through this buffer interface to other code. The > most important type in Scintilla (as in many editors) is a split (or > gapped) buffer. Upon receiving a lock call, it could collapse the gap and > return a stable pointer to its contents and then revert to its normal > behaviour on receiving an unlock. > A couple of questions come to mind: First, could this be implemented by a gapped_buffer object that implements the locking functionality you want, but that returns simple buffers to work with when the object is locked. In other words, do we need to add this extra functionality up in the core protocol when it can be implemented specifically the way Scintilla (cool editor by the way) wants it to be in the Scintilla specific extension. Second, if you are using mutexes to do this stuff, you'll have to be very careful about deadlock. I imagine: thread 1: grab the object lock grab the object pointer release the GIL do some work acquire the GIL # deadlock thread 2: acquire the GIL try to resize the object # requires no outstanding locks Thread 2 needs to make sure no objects are holding the object lock when it does the resize, but thread 1 can't acquire the GIL until thread 2 gives it up. Both are stuck. If you choose not to implement the locks with true mutexes, then you're probably going to end up polling and that's bad too. Is there a way out of this? This is part of the reason I didn't want to put a lock state into the bytes object. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From ask@perl.org Sat Jul 27 06:40:33 2002 From: ask@perl.org (Ask Bjoern Hansen) Date: Fri, 26 Jul 2002 22:40:33 -0700 (PDT) Subject: [Python-Dev] python.org/switch/ Message-ID: <20020726223911.T70962-100000@onion.valueclick.com> As presented on the Perl Lightning talks here at OSCON: Switch movies. You guys will dig Nathan's (nat.mov and nat.mpg). http://www.perl.org/tpc/2002/movies/switch/ ;-) - ask -- ask bjoern hansen, http://askbjoernhansen.com/ !try; do(); From tim.one@comcast.net Sat Jul 27 09:02:48 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 27 Jul 2002 04:02:48 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: http://www.python.org/sf/587076 has collected timings on 5 boxes so far. I also noted that msort() gets a 32% speedup on my box when sorting a 1.33-million line snapshot of the Python-Dev archive. This is a puzzler to account for, since you wouldn't think there's significant pre-existing lexicographic order in a file like that. McIlroy noted similar results from experiments on text, PostScript and C source files in his adaptive mergesort (which is why I tried sorting Python-Dev to begin with), but didn't offer a hypothesis. Performance across platforms is a hoot so far, with Neal's box even seeing a ~6% speedup on *sort. Skip's Pentium III acts most like my Pentium III, which shouldn't be surprising. Ours are the only reports where !sort is faster than *sort for samplesort, and also where ~sort under samplesort is faster than ~sort under timsort. ~sort (only 4 distinct values, repeated N/4 times) remains the most puzzling of the tests by far. Relative to its performance under samplesort, sf userid ~sort speedup under timsort (negative means slower) --------- --------------------------------------------------- montanaro -23% tim_one - 6% jacobs99 +18% lemburg +25% nascheme +30% Maybe it's a big win for AMD boxes, and a mixed bag for Intel boxes. Or maybe it's a win for newer boxes, and a loss for older boxes. Or maybe it's a bigger win the higher the clock rate (it hurt the most on the slowest box, and helped the most on the fastest). Since it ends up doing a sequence of perfectly balanced merges from start to finish, I thought perhaps it has to do with OS and/or chip intelligence in read-ahead cache optimizations -- but *sort also ends up doing a sequence of perfectly balanced merges, and doesn't behave at all like ~sort across boxes. ~sort does exercise the galloping code much more than other tests (*sort rarely gets into galloping mode; ~sort never gets out of galloping mode), so maybe it really has most to do with cache design. Whatever, it's starting to look like a no-brainer -- except for the extremely mixed ~sort results, the numbers so far are great. From mal@lemburg.com Sat Jul 27 09:54:35 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 27 Jul 2002 10:54:35 +0200 Subject: [Python-Dev] Sorting References: Message-ID: <3D425FCB.2010104@lemburg.com> This is a multi-part message in MIME format. --------------080204010802000409070906 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Tim Peters wrote: > [MAL] > >>Dang. Why don't you distribute a ZIP file which can be dumped >>onto the standard Python installation ? > > > A zip file containing what? And which "standard Python installation"? If > someone is on Python-Dev but can't deal with a one-file patch against CVS, > I'm not sure what to conclude, except that I don't want to deal with them at > this point . Point taken ;-) I meant something like this: Here's a ZIP file. To install take your standard Python CVS download, unzip it on top of it, then run echo "With .sort()" ./python -O sortperf.py 15 20 1 echo "With .msort()" ./python -O timsortperf.py 15 20 1 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ --------------080204010802000409070906 Content-Type: application/zip; name="tim.zip" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="tim.zip" UEsDBBQAAAAIABSc+izfCZtEKlMAAOEeAQAUABUAT2JqZWN0cy9saXN0b2JqZWN0LmNVVAkA A7iHQT3BqUE9VXgEAPQBZADMPGtzGzeSn8lfASu1NkmRMinbsmPa2lJk2quLIvskba1yjouF GYIkzOFgMg9SzMb726+7gZnBPEhJyV7dqlIxOQAajX53o4dPO+xcRjFTzlfhxkwuA08shR/z WCqfdZ42m99J3/WSiWB7nzbxXPkH8z18OJ2IKbu6fnc6/tvo5N3o8iqf+CaKJzB6MD9ufie8 SNgjm+hpvAlEBIONxtMOe69CFsnfxDjGzb4T/kROm80I93eZ9ONmqBJ/kgQ4pwXfmd9u/rPZ SPxIznwxYfTIkXHE3rL+sDxwCE9b9qO2z46P2Yths4mbXyJslgSvmw3WYWdT5rM3TP8dvjjq slgxzpaJF0ugClNTM8ZeHVQWHPafv9q64Oh5ZcHg6Nmr59sWvBgclhcMng36Lw+3LHje//6o tGAA+Lx4ueUMzw5fHpXP8OrZq1dH/fozHB4dDp6XznD0ctCHFbVnOOx//3LwoniGF8+OXr3s fz+oO8Pg6OXLl4cDc4aDg8LCw06n9WL/WUe2a1biIA7RClp1PZcRUysR9rjnKZfHImJBqAIV okRzD2HEc8E8lHqUqi5b8oX0ZyxUakkgpiCTfDKRZsEsVOt4fsAAtChARg2BzZbSm3SZk4D2 RLRe+CqZzXGjmVzhTr7gYS+WS8H4EvH4DYTREXO+krATQoRTecqf0epI/JoI36Xj8SAAjYha bZBdwjoIRZQNskCp0Nv0AhECykuZAthEsViyUBCasLa1nkt3jqhyeirjTZeJg9lBl3E3VBE8 9zw29fhKhfoEAP0f0p+oddRlaxnP8dv3tznOjkCCBRzO4iYeBySYwyes12PcnxCEtXgCR49i CZBnKkZ6AuoRiwIO2E9DPsutDLDHAasDbPP1RpqKK+HrzWNkaeTOwTABxh4YISSvjOEwvyYS wIJIwPmQgkBKZGwExCcgAHqlFgJpt4T/cR+XJRHQP1ZaaJ42GxPFwKQ0wFocH79lz4b4mUzK vv72jQEBQd5a/mEbvoYiTkKftVpkS2hmm+2zQZu9eaO/DpvsG9hIsIHAenZxObo6+59Ra8VD kGAwfl3mS+BQ1G7A3y9N3L5h/n5pNow1HPtiPcbPYMJsG2iWDvVkOWWtfOYbsHatf7U0hHa/ zZ6ShKtpC7dtt9u0pvFp85NYjqtYZYAQOkxE253j1YCZgMvF38/Ph+ZZRph+u9n8tPmofUgH PqJPGV+INZlsAolWWz9Pp6lgmJ3WdzagqEN9IH0WBvj/k7AdheH4Bz4582MRgkKegrS2kBEp JwglYFOzocEAlgSik54+Rw2XgeE/nQt3QXqO2jf11JoEATc3IJ7WrGWP3rKUuHQmws8godG8 UEBaFW4IP0BIBYBMCmL84ZRoYpOhyx4bal0jiwwFcJkmdWGP/KA5md6mdFJB71g5YxSPlE00 FblYM8E6GJxMi8RPJ+fnH08NCYjCGhlrnY3VjqPjxo0lCKqI7fVd1u+mhO1Yz9uGVxnVaMRI P/4Dz8c2Ga8vT05/BMi2PtqcQtkCFcQAwpD3CpUnnwJLUSLxfI/MDBIKhHlfuesNcgrnT1ut opi3ASLrHTNzIMLLxDc5OmBsxa0Iw2GdHn0Q8RkQqYB9lwIc+ecPUZQpiXrHfv+dSXb89q6j 6C1omUE/kw+UjvwhHPQqDsFljN+Haqk/tnBKY4/cMM1kCjwoOJ6QgxnfIxQ19lfCIADYjG7d 8RnOhgEVdlm6R/2J7uYICt9n+aUiLIbioZjBV3AqRdJnj4kH3SYr/tWsAstKco7sqgPqTXB4 WDvYCYZ38Hh88250ejl630q3GT5Ifv8g43fuCxQ0fK4wzWI8jzA0x0hgqwyUcEV7egc/wRdL mGxoCvOJgB1cafAcNm3kzUTLlPRtJUWxkH40KG0aCW+qlXA9FyF4z5xhK62VeCjaKWMkuW7D zFXRlj6MW7i5ZSLfsrOLa7DeNzasMgc+Gk+XM2HP5b6vYzMGlluYJBCDJ4qiahlAZ0CrnKJg CJpGOTRuU6PLCtjuD1InZyBVqVD0JDXHJ4pThADj+ovO/vLB4+Ku9sTCACzCMADkvzwA0Szp Ai0bMoRAGH+W+4MvMNt8+aKF6eyCZGlFh6MRWocTVxjm2NRKF9vPiTT7ZRG0TNKZH4kwrvEB FfGzbc2/xb+Zr6QEVdXrphjkRqCI+QklMSXM///QhQUYlbZrzUjGi9JxIGb8SUDiMYmoJmJM w0rJSRM1ZTwxyVY5wq21BBjA/N2/Drk+nxYgiGiu/nZ6cjG+Onk/Gv8w+nB2gYPNSgj2yFYY wOudwpzGAWBrHk5A9VCeT+ch4CEh37mW0VKEkOo0GGWwoXiCeWBElR4GcV7MXA4RotaOSC3F HKJhSrlCMUlcEeml8Tzk0RzTPpjpA4QO2JNNh0H2NzPZNCxxIb+MIbuiNBCXyeVSTCQ8gxRx IjwBgwcUazdQ46xAb0gaStmEUTwT1xZMdU4H0Lws0NTR6/vL0cieUQwmMcXpHcfBeBoK0bKD xS0sGF28IwYUPQFxOwCzGld43WXvz85HrDM1igkJ9SyyBaBJZ4aNLkUQjlCMczxSNLQvfpSd PvPNbSvgRmFqTAmLaQu32/t8cHDwpWCv+/r0xVk0I7N3/SFD0DYTmNzftzc+NhsXwMB/WXKQ yfQnokmRQSCLML/fts7TMMc/F3xVx4aiNuMJyofQx7wDTJ0jtwJr4mII60tM3Oa5Qas6cKSg 8DCQwqVkE9URJoQiSrw4y712c3tVx+yUvUh29tfaqDnlNHudx7kmoCgEBBlEg1M9LCMyMwUe /53yhQGXncxK5PspwulgHulby5vGIhFt21jQEdydM6GLygeMXagYqzE81iZmyTdsmQB/RFaU AwMNdiNS8DzCEs9UxO68WLLL4IKchVRD0jalLNkrS7D396URbKxIwI4J+t9GZOfnyClDR9vE UMRVSG1sgjU0sJxaxt9pOkH0Y6LisbFh8J0xoJKu7eUWEx0OHtdYR9xTA06Vv8wlgHECkRty EUyrqzQlorTAOZUhRtiwhcfROGPEockEUbcOJXR2OboeU+SmEW6TzmNoslVq9lJR2CYFDdSV nCC4w9n16KeMJH2twQb2qfJdHp/4k3fCaz2GYVytZ+hkrLLcEHUXCjvw//Kn8d9GuR4b7DgZ AjWYbzvZDsA5WRBzjeB2FfwELDdVT6ztxmoG0QtklVRPRQPOHBGvhTCKs5VWxtTvJFZmY8YZ iP9S0m/BeQz+2l7n8t9s4tLXhSysMLXetK8sy643LedpZNc94c/iecmyc5N602JuRR0AoArC VX7MIYAsA7HjVuEVfXvZ+PBtxsddFoqCl9Kdn6plwEPxg1JeS3jdiuDBxrKtE2eKqoBEo//O bBMCPC5GCOQ8qfKXTiiFEKVgWTvKbZ5S6upTiRTFAlSxgJAf/j++SJQnb7xs+QviYpeKttAp 8qQrthAKMm/zaS5n85pyuE9lHhN6EL2wLG3YRp91jptxlZ4d26TOJ9oSroHhrgANx2kefX9L 3wtA6XkFqpldAOsHuoxcyKHskEGv6uk9DR5+wWwVmZFpECFFSkQgrKjUKrJk6GSMaZQTcUDR mmEwSVPyb7WVY18XjmuLsFeGvfdgbTWR5X+gImseWLJVJjeioLenrXcJp0vOaJdFc5zMoqWS k0XCVUmtnBCW20d8r8IlT9URLzfywhNjWHqCKA6SQo2W8DEEJD1vYUnql72/HBz2+9EvEOmm 1Siz1HGK+ZzPl6KWfOn9m1PN9PGkDXO9kMs022fOgwQ8vbIoCXb9lc0u//BvkW9LsMt7OXfv 5TxMl/Yt/O/WqO8SH9tFnN3ZmKgTz7znQxfb2ddhQURrRLOmdO6nhtQ3VrTK/A7z/wTPKxyn cnNOspq827d4QWNf9djXknB8TSfpGrYlFF+JURanOiaHDqiUaLLnuvsxY+bKcQ9kBg9wYtWa N0SePwiXJ+BMPt980ZEdQ1UPhZuEkVxhGUj6dCVPyq4CkaYtym/qOhOkhZQHsrXQOWCgojiA gJFC2Rwud2O5kvGGJX4sPVrMp3h1kuWKAAnrYtg8gdf5gIjypcs9Fs15IA5MPQxoL2h1ul/a OALJk90HwsOQb7rsCZxl43riCVFCMWquSNdjXA6JWCjJsgUbwkUXuinphSAvLYOxKYQ5OE5r Ed8D9rrXonjckmCzW9cIc+k2wegCiBKQ/opS4ykm3x53KePWdKC7bfgyoWmncwyekCha7s3g ggbPlQpMqIX9WNtN6KpdvsewtCsLJwoOYmX8g5+Zm7Twh1M5AnKMpONpAuFKIDtVJ/f4Z/n6 KxoavodtJkRbx+S4lDDTGUDUSf7RoFle06Gc0c/qS+loLu5FP9pl2uRZWcuqUJwy+3wrXrDf w+/B356WMbr52u7xCK29fNVqt9MrXOX9h8eN5gqE27ZxgtYSAjQNp2dHjCnkFAO0gUYpKGnA 0u/F6B+WeesWwLSH2Q19cakx1LjHJO2jQBV4R/rJehNzS5MtyYEalSbBI9O9yKLWRR61LsBw a7O9v2/ufD4vvqTyPslbXDSMpx2Cgms7TzUg2w2k0Boazv7kSxFmww5k3oKu4zP7Vq54KWcx SlsF7j+J2ZRLz+hTzh+zT1nicZW+lWL3JNV9sIHIYpJltjU9J/g43eWRlUiaS4B3o/N0OFXh 6pVipcacM9GmeK+3wFzWcFOnq/W0L/OvN9AcPE6lAkBtEYVaMldZmQZ1Cx0jLHT8ADKhDVc1 plsXY7pFGtPdmFBhrYlM0ZzJita5DUlJSECza5kAD5QOla5mTOhRywUD1DoVVqjZ48eWESje a1k3OvmUdoVelVSpX4w1iy0d98vfam7y/0QyV6qzlNzO7nROu6HtIbP0yc/Xh855f4Jvig3V NgQ/7YDVYXUzC4vLdUgEltUBizcMefCZTqoLOHFsaO4q6hsI8jh9oAGXr8wNrxul3gfybZXY Ok91SmKa3tqXhNU0Vf6R82xpfOjoRreO/8B2B6qsQtAFgrTRorOtdyA786CUT1AsGHOwy67C 2NifMQg9BxABQZzRt7yWlXCUUw3rtr6mEK6F62sWJBmKqfRBsbqtZ5daHbpMpfe3hSJcSvWt RGcpdV7Dx2LStSWj2VG+rCp7oSlrvOJekoVCOyuc/7dtT5VIu96ecFJluT8wEWwph89OVFNb KHw3CX0e/mYr73G5Wq2b390rReXhaGBExcxamYvjSm6dHws+XUBmaGFlnuywm9K3rr+KKFq+ ISze3dvVkrz6dRLOxp94GInrJPCA/LCoy/bkx9d6j70uewz8eAyJT+UMeZeKObS8w9zztI1m F9or+5ojh60bXgoaWN4t1RhxG8M24Fu0Q7tjP2enb4HJnvB3+ROY5cCUodX4mDdL2O0F1L4L 2WHueMDIRXOFRk6GbiJj3ZfiYNIPSXi80XYuF2Gn2h2RN9Mh3MJRO20n24ZHmCyXVK2dlQsi O1GlfJ7ewpBxQpWN14wfaKJC2EBjesLNzc1rdob1Dn+hqx5rvmH4akOMCaAjIBiPdDmD+yyJ 6PUOXGgltm2qL+C6CnYTJSLq+qklg1O83DecyuLuR07lrqjqYvUaq4h1L6ch7+k0HNIIy0c0 G04qTEWZMFeeoaCAgGtSMTmRaqmLGw/rWDQNi7jXPgnnAx34FokzlnyrO6czBCpIvLQZArsD wCmQeNK9rfMksrKpSo23yIwyL67Ma0Xj99xmipNypMyQhzjxb83yqe93rWji19qbibKlUXh/ TdaGPlXP1KLnYH7BCifkW0GLIHsiHXWE7hUBImUX2o8MyJJl1oN1plCjpFelBZbS0gfEMVsN vTEW9zC8TaPHRToASynMSK1Om2UU2U4L58F0cO6mwR/xzqACD3LNQABSrQf459/la9iFnHON Y67tr859jilL9sjaLxW2T6rlUvlUOklCwXT521jdO2JCtgeI6EKw9lmVtuua1zOosIaSXW5k rgapxR7o+8WpGqc74tIcrVXZsta3RBtRslxUFvVQuGrrh7avj+wUs1CF3XZPukpbdS/FSgDT sc0Vt9IvS5omAkY2p6uJ7imWBGgnWuLW9RK8oGizuaSmGLvHN9QAs7uRLNzxlC2ZnbnuhzAN VgD98WMARx6q15tjGGlqKTD0BkdKphor0wATDwn/4Bda1IB/4AvVnPf39biB980cGbw4+zWR 7iLCkKj0Gje90Iq3FxGSQnf5g68hhYYggRqDub+g1rGJjNwE8hO8kiGvcy2X7JOgmdT7jLVC aleREYJOfFc34LFrvhB4yRISwV1I0SjzxNctw16UBIEnxQQ3c6vLWYv7G1qDZonpN80Npm2A jXWViIIr7A2MuT/h4aQOUM6OSmNNu4uVavSshBFBMljleEAYhnJFRLkksYrAvlDDodaOAQK5 Be5tuqBo9BkUbXNg94XTKyORZ3fr39pysrG/6EOIUiAd0tuQRaM3tDuJsa1Hr6xJC7fSANCg zcfn12ncdEoNYoYqEGHUcQe7CGOwAlEWnDzrQdRJr9ia9i+6C4tDyC1B2KYcK8Qt+ERUy960 xVOQxyJ7TKHnYWoe9FjlKMa4p9bktpjWbvRXDS6LVbSd73fZ7Y5hMDob7ZkKPaBUwTOU7TI9 lVAq5MPkg4ZprXQr3sYNnflp4RDmtstGLRRp5anOMhdvkBp7dewxnbIm6dvy3g4dEjE5ic4V gA/LjXnmQdaE/MaEb/juw9kFhsqoHcj8aAlEwjcHyKjom801vekt4Bng5Xk9x1Nrn0UcLRHa JFSoGHtd0SwNDYjMLHF69yDEa1EdvjvS5yHeFaMl1fblTMcvCAhCGNjSE9jY+pLMG6GVbVY2 gHhNq8IFAPmB4CKMDDSlSmwq1oCOYXyk3+fHm+somU5h4GProtM5hLSKxxzc/kr3M2vTKfTb U+IWEnR0IRYUuorGNyTCjIaQtCYevv9/YN+wpqODfj+l+KeTy+uz67OPu0iv/Zt1cGKDo0Pk WCF+9JY+njMnOq3SRA/QV00o4STKIJCc1prQCEXT+rBLpsBX6aEoNzW4wzIQXio3QjaOBWyY YN4hIdy/Q0DoGgx5AA+19jbmtxN87ISOzFS+xMolTi4QXLs4b5MRXlez6NWVNaTp7lxF+HJK FCVL7XxSG+kq/FWAfx2+uNXMQsTJA6x5EOipAZch7rikZK6H+eQE90wdJd47I6Z4vxCa63zc KCC/Q44Nb9YRWJyh5mxQIcA5T9RvgJl+zS4EckYRmwPGfDYLxQytqgA5c4nWeoPBX5CgyoJF 79ckGH1qkmE5N1aJO9+wN2vpL47LElWUoOdGsk5ufhpdfsgkijgJvPWTpSOIAKZbP8IfcQBp 4t4aVXQpQuoa0FLF2cIHBe/FqudkquvOE3/BVpJX1JfemQWXTedW08yTm44FIiathjN/kCt6 wUhDg9GLFH6KmJZCjr/MkQRGqH4EshFKttbBsX4k82B+ASMG/+aAjJJyAEtyBBFKi6S5aL9g 47b+5QyUJNAbatbPLC9hh8j8SLmAZyzLSnmrVA7QLLi4KCdwpgC4MhPwyDAe32wgEgdBqOCz Pi2CCgX4X21goljTDeS92ESDQS+mKFPpEpDQnUNU7kJmYn7BZCpvARlLcVCsFNJhRcD1dpmV YpwkG/Qfu8f9Eh9T8HGi5fqiKIKpqA1ekOhdXZ+c/liwZob/KgnJQGNo56IMnOBvXszmDDgo l6gdUjfQNNMOIezRWQPFKGYly5cHuAT1QleRccEFe8o8v3XRBhdY0YoO/oZMhlgXXzw56tPP uqRmwvyeC0IqBtK0D6w+eo6OxfQ9rcGmJpF+YcWRM5LBzAQTDOxZ0TIKR05fv1MuNkiZd+tA 6aLEsVyjUcEMjo6jxS13Y2qoYvFaFSifU/qob4Xsgt0gvX4mHaUIFYPrD3jltIf5614aIVtR RshlRKJKW2IkAktGGN/tLfaQUJGgUgvFfnI6ZTdvfjYK6rM9jIAW7T0G6oSNWJGJ30N6D5Dc DGUNRGww3RH1JWH7vbiNM6sGWCO8n4kWWZ0OMhfrxGfvz69bN132c5v6UXRHBp4Qn3VTlWvr ogXTt2xw4iFj+Nsm/9velza3cSRpfyZ+RYsTSwMkQJOUZMmkpQmOTHm0q2sszYw3NFpGg2iK EIEGBgCvHWt/+5vPk1lXd4OiZGuO3VfhMNHddWZlZeVd2LUcKYEVjkKHqH0s45gRoEQCfMMG 54lqKLGLYw+8BFdZSH6dmRCKUz3P8sHMZAZ25IkEoW30jsfdG8iaEBmNGpWL4buzydm8KuMq 7LEuSjRlqcky1Rkqpkpys5RFEcmLPb0Ayi9mcjCTBzjJkcRoAhcZrhx+nAwdVRLCzZHxkw5u JAf5gJTQhtAGl2GVhVOWtgTYV5MzgQecTkC2b3XY9ZNjQ0slt4jByDwj283oeLLFkgc4JIAr 3D3HTjwDDCdni6kAXOGiDJFGtmZCKBltpqRbiQkwDsXhhcX4Vvgfkr7OssGZI/kdIlss4AUk uVYhkDwSBI3CH528BPUcv0IftzqnrxJxl6RW5A3iomyAHBKYjO9vFTvK6V5zRokR3Qnxv1lz ienwfLKAgBhUGX7519ZiPIDIQBsJHYEqqGDr73LrQKVhrXSoyuBPZ8rec81KqwiZsWacrUeo DJN3aaCyglIWFlmfzL1oBL8+KkegHnZtr3Au0KVgrit0WjoXuOaysXbVoqIlYNQVXBGc4ixG nc34K7yx7OvMTdEK4FSdF7Ijue3O8yNsS6HLpIiMdCy0QrDHOLBKqxqWYjmo6KI2yjaEgM2y nowBKaa21UJCCsfByMJN1ekJE53yq7m4KRSmG+rZ5PIzhX58x7IQ+uZrncDQA8Wydp1MkNZM jsMl0Akx3logwGfk4BNVdwsFsc3CQFsWVy6v9NTBQQc5yrGfhVBHLrlNeqSsuItw1PrqAiqS xCJ4/46xfdDwVzhRryyBWIXQsfarkfD4moJNGsVZxDRw9vVRfqYWPJVJRTAA4SZzgIRjEF+k onJtz1796RESC66oJ/BXMhHw52fzM/gqgOKzCai5MFgLxVTMn3p0zRDJNYK/2NT8xYC47WlP cWCdi6t7M/V6avE0223FYr/K7kE4VILoWC87wcB2nQgfUnziIaMmStvgTq36kePjRHPl/DMT eZ5akR4BHuRz+IzLGgb9qrQMgjkZDuTUZyK+C+Xsd4RTFpacnU6y8mthO8uOcthj6b8YRMcm ccEje5ss9XZHpXLpCDHB5HdnUmMy9u7iJvmTSZ8YHyUbeE5W9BShoqHRnCxsQdaNWe/yuaYQ HKlIQlSaK1tb2n4LYoqfb6JGINuzKKYI74zfEakUw6AK5GAows01VluXQm20fRgWCBs/eOWk 6NrjqrnR+00kzLhIHuR9Ra5OFBYhrpws9OkDBMxi2yOPBdj30ZXTERWAgrKzk9lA2aKwtLrV cwd1Iz2AtjW8U21XxWGmuSh7k+Pe7dCaDOyxLpsM7HxYXHSt/dPsf4TOv2sbhnSUDQ7STbaq 7WEZrMxq0ux/CvcEjjIfCYlFy2A+Ma/sdFgOTCbunwkaLYgm5lTDZeltuy99JBYUsu+UQYp2 g6syHyvO64Z47eTpCDlskxj/ngdOVhNdcsfmIwjFFInH08l8PuyPTLzM51fj6WKysJ01mA3P neZHaHH+rlD7WsINe6mZO8EDQ2j8joh4o1L+B43z9s7X99yz0wyGxUFDtzM76uAlcUHtoJR/ J8IrZ/vnQqEqlOkqA6nGdIXU90bFeSHn0Qwdo9LR2SLl2f0QtaE/FYJZk/4cwreQidcTmZyQ DEMBEXBmuSmblKS9fGmK0IVIQwOTCFSI9zv+5Oz4eISNzJyQ3M/xJpzOJkdFMZhDhI4YDxLu xl3cNSUPqbQgx1iQo60SjqziuMNBDUtH6GXDFDMS8XzwXgi+8JjtYkht40QFmFFxTKpKfcXw 3cniVkfTtnp1zk+ePrXc+c+dzgMlEnF3s5dSVpr6KXsJWVEIlPak+WoE7gU1lyQrChd604Ad dAfSOjgQSzfB+bVlgh2vCHUQn3hQY/hIoQP1ArK0XtnK2PBIRBWryEoUU2qNgITEs+FACJTt PlvSTtrcT/XW/bBMqSqVdl1OuQACZDh7qVRSWSp5xwpcDdMQcX2izvUILieqyR3JmeXkbhCP BfidXJlEO7KYH/a/C0fjIJ4JwosMObqVVmN6Vku5qnK6LQXZIZrojPcqXLKho8mgICN4Qa5j 7ogh40DD2lzk07lX8J4WV073U59gwCQbaokcAnOGXxHbcdTNzoFKUvaIqhzke8a2IX7DQ13E uZRjYKzYRHAXZ5GdEsWMBxUX+sV/iEBv2p6fMLiz0naHHOgX+dUmOZxksD+hmZN84JDcjUue J90IREHJ8IRSMeoqjF60tykkzo+EyIL3FBohZDKkwB0VZfsn5KFNASks+BTiuPK4O6rvV/C+ GiKRsNUDjK+mRo7nMHeD9RZKPk9U/a7xrtfGXVBPCFbYbT7QVCctz85Emnzl+alXUO89Bxr8 TRNwF5rQeDGPbfWeyTErfXst0wTi0nxHbfYaKjjJ2tCs0HavH2G7z1ZJWlfj2L21QK888WHA 4Iyr0atyRG3duPNpXpJUCl/cY6sihUygYWJtJeEPYWmUPfoGwv7JUMtRC6XfvxNmVpHZyI32 rFoSI6xeoe+YU6qyjVEiKpHyJWN3opiw65vZz+zrZ9W/UFmIgEnylKbVDjzqJqv/GeyMjlAq bXVNaDGXDxlq0DvryphByNLzEOH7V7qPdHTjIi9Nrkk0IokhWz0aoJxg3yKl7FFMeZ0cpIDU BWw6JXlwx9nGXAynOld+20hPUPTiXDSGQ60ojuHyRZxXAeA9GvbH4OZEIFImbKi6+PmQXlIU Y+Wwnxwfq95cDw1kT/THdLaqBVYzipdqpJRNIZRHP8CrumrDe57Nz5jbSIjbKSTyR398/eLx 49/tvzpAEstEoxt9uuPUUCC/rnn4bMumunO769KQQvZb2tMdLtP21jfdULpYHPHtzt2tbmsF 2eOlxM639+XPzv37d+XPN7dvf4uXt+9v4VG+SX/y+s72Ngvfvrd1e4sftu/evYMv2/e+tU9b d7f1270793e+vcse7m1/880Oe9mSYvfwc+fuzr1v7rFT+XX7zjfbKLq99e39na27t++hxM79 2zt3tnbw+86929/eu3d/G6OQEnfv3L9z5073Y5Pf+UbnuXX7zvbdb7fubmkFZzJ6rkWPcQ4w Spa3B9ze6fWHiz1r4R5aAOpGGsGKpH1jteAXVAJWdX5L1H0LZCZySr+6CpFRRN3MCOACqe74 +0X5Izg8KXENpadJ5403SLytaRVVhwjP3hPk6CC9cLFAZlqO/CoitasSP1DkoN83p5beZ/+j T8tzxPCaR/dc+p3uQu9BCUh5cXhGkVL2z4dQj0xzGBVLvbjgc7vGkYnBK1HefZAqFoR+mSTY Vbq+rXT8MRVgQiMGKXHRVuabIi0CkkYm+PZtcCDWUhpFRfRzZ5Zl6dZqUSZ5a2eLEVSssbFh J97fzHu81h1VWX05705N4Vh6aV3KX8DTXbhaO3koREyAY8HyLQLgSjhwVT+pguc6jUbren4i m1y/8Nw9bfymjzM+ZTQU1pVMQ9CkwAwfjH0CZjldbm9bh56/LJUlBkqIRAP4feBiPRfGhyRB xwb1y0ZErO3k96ui8znF4UHwUe6SdkMNqMDmE+URVKnufEMecmjtuChTg3E0biHb27iGoN08 mg7LJxp7z8OoZwc60VcOG18JGxkprOwAtCRzXrr04NQJK5NApsqY6ohTz6bz4mww6amKRZfO 62YsDQQ40KtsdlZCvlJ1C7wwSuXnboE1pZ5m9E449cXJWH0virnewZGpO0dJxbKWPDc1h154 IQ3PwQEAckDb5NqYOeRKxKN/bTyK5uubCfmfQVlExW1SY9gUIOEvnulYKz4LGMD6HhvclExs Qri5IZIUWPh10v57qvhtXPyznn3z7dY338rq3uM3hPIhKws//hv2YM+FWiyYb2w0gWOu/tHH 9/rIrAoL5rZ0gRbkzEPCDurF/O0jCaO86W071cOPRoKJjMhODUegnQdzsMBKhwvekyBw8/9k COaCfJzl51KMfqE8S1TZj35HdPB3a2TVaCPIGEQJwmheIWGP2KGl/Un5WPNiQiZ1GBAV3R74 hcfJS6fU8Ay78tYauKv5Sxz5o5/aLz1GzKC2p+jWcNqmx60S5T8XJKBQFoDvJn/hXkB7QE4Y 1hg3m12lkbynB9oQ0LiuOtlQO6dGFbU4b/qyS0UOxSiFzDqEjXWoW7ReXjoRA9IxGvNZMo1N SL06fv7Z+gm++1b44QPPVOhrB56oPLeNGsZM00tn0DkF5khJyTOxZTn5aBtR/bVzdlDNBoU4 V0oOcKdayKD91s2+dAeBvanunJVk86xoAoOQhYQD/13F2h/589jSdN2IqKgpIn3Q4FxWH7ML aiRf1oRWvf/HmR+iOSQ7bG0tizgDHdhcTxPdcu4qpmi7sRx8N0w65LMzjnpiZj7y+Jf4yfOF +sorQcM/eswL8fVFvFkUWRn03Qc/g4S/lP1h5Mv4zJbLwJgFhfR1C2PE/vGQ7jCKwypNm0tq 5nLXau+9Hkihb095JqVr5US5FBEKBiLb37JanL7y2FL17aZBhkCIXtvkHZcQfQmArpFHNyqv x3ArWSm5vRfeIh4nNEm40oxYMo5aQYIEpcLOF8ayOuqreRoHIAzCA21ubpqY0dsO6ctL+orV DiIqAxY2UmGLvjZP1QujH1QkwMIL+63rs8/UT6ZbrKCi4zi9+iNoQzhcMzWHmtR1o2oYqyl6 uVWMEXJcitcldSNXGhur+fQ6kdWobEUD5TKURjw/FcfW+7q6WgBWL1/au7az7SZ13fkYfI20 O1VmOdY+qRLU1J4Q+x3LK7S2lWMqz8ZumxPunmTH6BMlfHKkIdbOV9YFHWJ4lTTSkBeQD8zb 2gXEpxvbGi25a3V06JoWTAFjW8iTl0BcskBWsgpBqZATT0xOez0XrBoR43heAW51pLGhRG3Z mHo9dGfRPjcc4wcbhtblWqCszJs65Wju3jcmsAEQaGhYhO2u12Ah9ec/7R+GL2FBxt43taAC mazBFJpXqO5uafEXOBgng0HmFZ5qu40MQGpG/o574+GDrjnpldJnPnDt0zAMrkYLOZ+2k2I0 VV9Kv4nNt8SZ87zNX5+G8PHHEFW6sc6ckn+4oLXDsAtka+69ReZn/WNZMrskYXV6NitWszbE 2dBFx9EiFyquxk0hPlorGgzia2absTsTbpZzLk1k2LZjNx4AaISDlh4++DHTwDKupkNsnLt0 dBkpxoGETHjnIfhEmFgemltJdUfQ5wiqI361Q052gBF853MUhPu6z1EygJkZ2Kr9f5d0n7Rg I2nQWs1ktejX1evthdHirRtvlfGg6Rn8zVHhmY3ZG17nsm5zEoaCKZrQzF51cnqe8Qxrmme8 Km4xHoT1Mdb763WVbhX1YwIHOzP2wneR24HpIWoVlHQgOBM14vUzImsuXgoCXcdZso5uGf0q 9nozR8ASkUGnIuT0QZjMg2g2pkmYmW8nnEnezN6ma2oN2rmko0irc7yC4aw+eltByZiO2okP 8rAarK6rVNMbJ3uCcGhBrsWJ0Y13JY764cL2lpFOznd9FpyqWt4fLpzdIE7w5CuRmGnN+xoF 46J2sR+842bqxWMASD/PuqOOBwFMW7AtyIgRlc1BDjqV9ro4s9FeZZGTewxTguahtAlS6wOS qBxRdmhOt63Y7U74y9BIcFmGz39/+M7qgQHzVRynA4eIq10T5uYLeTI8Xf1uNXKClokOGA1k KA1vI6TvFLZvOCvgyb5J5wsaPs2nEPuyqzb84uqrWVyVvuaQTZ374frIsyPR1jyJdC5Ro1nQ ClQ8K0cx653sD2nBjpqSHcJkO9TgPyMmtqU+tJq3UESs4/3j1QPKBlNHMKN/JX0pEVTMae8i ruAYmmYqfGfzBRLujQsazS7cbavUKCovppKbWkpjIuN/q08Guc+UqgTmV9+w4K1kK6ug4vX6 PncI/VZN1aAurP4M8j6yDIyYGdQS4QVn3l7lLXkb43+qgksqbrDkbKkwU2PNVEX00QFNGkc0 WzagMJ5oPk0iE8azsbHAZZW4ZVaNcJr1I3HUIXvLTX8Dh8+Whi3bIuMqBLfk0Kb25RhVkzY8 cSddZ75UoSPYkzOLUG/xHij6GAwYBQ3PTtjY99Hcqg8igyMIM1sd4QpvSjyak0IvEKY/FLSN W295JE5w6OqPHf7Apc8tcxtyjQ2KG7T20Bp7aG091KYQ7VwI+VqPGgmxKltu2rg32TyKqO+S b9vu24g2nc0WrihXUyBYP0aoMkolN1dijYxTJzvViQtBXJQh0g3GObqirobBEHJwYCoGiYvS kYo+jDzNj6F9nflkBqF2y18S7QxR58MJYsPo1ZX3hySsbR2JgKQo52czdbAL1J5UtOX18riu WqBjyji6rsAfNgqCrzri0skjsn8y29yhoMWnWj41N1i0VD69h7c+VqyGRj3j1dU9bukRQrSB Us9wLYZaGnekjPG2Mr51KdvbtsDstM2Q+shIwcb2ntsbkAJRfWOjTNisuEkeC+kx4k+WSvrg X9pL2q4nDAK6ZZTiqaa5VtU+ogjhNDF0Fhu6XhHNLZBIlmwy23MhYvroNQOt3OsQguO+NNF1 rt3sxTUfXwuXHk4t95typLYUro968w6Bf9NDnkqW5SrYjorLo2LqIuHnraTHKtvsH+r9ZG2Z Yl5edTbftlqr+ao6wydQ0NQYZRSRSkuY0dVcCJMMtvQx1A/hz95aPUFwPlsrzbpBf2WEptIx 7J0jxgXiKpFfgqdoCWajNEsi/V1mLb4dRpc9QaGsqRiUCvnkYUO1dkn/rxlCqqlS8uGZDwXg RsNSb21ulkHxpFQ2f3PaA61WMe0BngUqU1Xfqd+3TGK9Da0bIyOG5RlopYwINIgS15uSvi/T UfQFXnIv/txtodkoOETBcrqXwc8fBWx6OKdPgxGRynbTactYjoSOtohsTklHRU/ZO3WFlJ3H JFKa5hJ7MHi9QFK2zcXlgiy1jHTivFqd3SEleIaMwNaI5HEcEclzlxjoHyzcsvQfQ4a8uzwf mEJ4Oo0oIGCG5Ln4X0nbLfPMlIzSCQ/fMeW51IKVDG+kKWvUqKX+2vbEUIYqbYfEd29Qyy1+ r5fphJ0SwfLuw1alJbMNN+gEYewTXrdcABQijinDZeP8UocBkyVHia7XBGmAd9DmIJOAS/Nu nD1qfGdVY6qYv0Ev0SxWohkrMOWfPmojzDe74VTYvGwTr2EGWqERcpHeER/qa++mcMuMwqNO 08Rd3ZhI+94eupm0GhqHPOpTr/A6A9nzk+NjYWXmISJc3q0p2VHdkU06LPxK+hgfPNUhv42W WtU19ZXuZY2r3PMIcN1KG1y23UpvvdV4nIubLjPIDV7pQlfktdoK9OIVqKPDUmy4ETJUl+uX raseV+fFRxcYCvdAIKJZhRm7RffvTnXd86zn0MBTFKGE3/kGKaLaz+8ye2GAKC3ek1riN027 XTfgnIe/p+zwElXXTRe7h8xB7ix2nakPrBxdMxrK9Du9cuSrUOjvEUegBjo24M5JHsY+gDGM q3JwYWQuE5HwVB56TlsQTzlc1zYOkGZcJv56NPfxmS1PgMZvA1aGZRlvKLbju42J6+i4Qiu1 l2Dv+K1zNHKsr2uP2NvZU7F6onNrmC2r25HPuS5lBQ8slwAdpOJzrdN1bJXz10IPzhBcXA7n 6h0luLlbvu224Ig2/yizdy0bRiX8L+VXHpheWfkVOc57zNxlggvbZjyC03lZ8q8xcjMjCYY6 Jan5XxNARGaCd0xRwk6v03rBbdMsG/SKYNToSc6oANiJlScZHrcuZkORNZG4hm7b5oqseH2F nEpmOl510TUE2W9XecfwEq5EWeT/DWwJR76ed5LD6jPOqgdJzb/XWaWjtwPr78uURDxJIyh+ XZ7k73F2xTyL504ffA57WkEGz55+Se5UJRRS5gamJeUXkyX652Zafg1m9IswJWGN/9mYknRk X5opMbwbv43XdC8mpcZpeHZkGeMSkPJTmJJ0tjdmSjS1Xn45HJ+N4/Rk5YKuvNRNPYPu9RUu EJfzdaoqiR70HZZKqpUFLz47xpckjNJEsOqsJ5Xw79mT54dMYHX46umTR8gVNT0ZytGJxFb2 4eXB8++fPP8BNRSVUOJ/5PDa/Gb7PqJwXSq6alsynts73ez+XfoXDgfDo7MR04agKQ010PF2 s3cTyzoEtwwffqCZp+IcQWnmrXSA0hFB+gSKkzIXQENfdBYcN6NkfvE4u9CzMOi0mkAI43Q5 EuCQ5lQlwXCqY40VzP2ZZQcksNFCpTvnwbEKhwu46SxWnZM0EidZ8PgYuSv+2zR/RwwxX6fR BNHPcxfAL5Nj0GOOoFbhxPI++FtClom3XIh56WPM1wM0VdnGDFtxrjRcwpDPEB2GiPbZAjsb uUZmV4y+RrQGMWoHjTF4cfsbGaUmM9veuc8g0PkRqo1lUGeML7PUf9Ln0yjUUkpT+dXK1jj3 nk8axexvw7K2Irocll0L4D45m1nuAksp+A2GpSy0ZluspZ0KUUZHTLlbaa2CZRWcvr0TcAzg v4DHGAr9sP/06YuXFk/KfTuDMR02S0FlVX/q4c0kKcKXdzFSRrwvmGEUwFc/aSS3pEaTpmlL RLdQdryMekMDzGt3dKaHEjpvUPPRqRK6vvrcbNj3OSvY0QO1cZKPIb06pk8ZM3U+OfKhxN40 xKBrIVHvzxRbmVxFcHyIBcUFDnSs1ixZuHnSwvvHE8usAZcmad2tTnW7YxFevd5/fXD4+uDZ S16yke3c/YaEmzkHxrzUUnOXcTvjpkrmTcKdgrgwy0K4/tZyHhBJ3uamQLRNH4E2hCVexkG5 qBoBaqKERWZ9lX9FOQ4X2LsgZiCXzE/JmoLMgtSd2yGS8qjDFGCHqSAdpB4E7vSMZRseWXBe 99EAJiDNp/mRXtOp1EnD0d0dmNayDXXflhU5ONzZoph3Vbg7OjhahNT/KPj+m6HuF2iydaSD wQwI2pethrgLILRm09ekK6OixOuI6hBJNIKXLdDzpC2nKWM9cycAD3wCV9AB4Mq8o/Io7xex 88t1u+E6wg2bfLex/daXNK8ZqsldSgu3Mi7VgeVo5eCg4sdhxBStOk2Z1mlRaB5VyyZN7Xlx CeF6uNDw9uHx0NPsQeGWzcx78fJxjLUj7K0TQGUuTR/jJvwCv2naGrjEPdrLe5YhsYSaQ538 KowF5QFsj8mslhueGHAIm0Y7Ig/r8ZUiqThtbNN47q+8k4mN572HPp935rcM38OnAH/9pNx7 2wcPGve/FbIrWZWhejwrCr9C3IKafySbXJQhH0WYhjuOnR2L1umBMTslUvhB+cHTn3s3Bpse PBaSpxWDxzXSOvHARRZA55x9jNFVRybHLfP9NkIdNaRcBfDLoQwZRyF6qwJScMZ26aC0qYU6 vxD+CvMD2t498xlNDZvrK/gAfOWzOU80QM5HPJF/deaqLeadOjsCj0PgqkpL7VNs8YiXeAq8 302gT0rABtWQQu1dsagDzbRBBfzPrgMfk658pwAxAEQG9i0VxL5nXkfkEpECtzRzBn0ayLrw XkjNoOOuSJ6MNBEuyYnNSaNo0J86J5BlABm6KCxxJO8auNCsHMbIsJIjLSmS6AV/bkFjH4WO rvwzDldnuO7iXX260U6CQCqKpViAinutyh1TTXcEJYJOcobjYp5nB8/az151s+cHB993RLbT vwJyedsJ/f1W8GF35S+U59N1dXU1mSmXWZm8PDDjsbl6muvuZZn+kjJ9YGLs9XKB/BzDsqeJ 3zK0Tte4fmz1ViIgHWzgO9we+0TpZ3rPthw9dFdTG/K0Dy++aZ5kIOZ3qQsxv19n39BYwsHZ btG7GWyzHJ+Nuqn+t3FfjCbXUHCOi1skT1/37XU/cVppYH/iE25QzBd2nNn9CbzGBnvn3dlw 5C41h0RzzhjQcsLkJXQCC3tzbQ2wxf/7VLrmTusqq2C/Etg7FE5wDfMs8+ptPrw3IJxI0fmE TVSMj6ZXuhOQBg4NLNsymCocbOERN3WkNDfnnQV9rQWIvLC+1yv7jsz0NRSvZTFVr7CSypkl sY3k2uh4ROD9hjYFihn7IuWVXtiw4BPcN9dY+ne10qrt+t7FOc9yKH6k7wvcPKKu2LpG8BaB j1ZHdfdnpekcTVREoJfmNJ7Dgb2En6tXOOvQBQR990OoRraSzDFTjSlTJhLdiIoh4YEp/U6d bpkP1Zi0xy4mrQHsdFw1sPApjwajjuFcF2s7WpqV6uq4MjYdDbY0OcoqRKECTkFtA49HloeR 5dHI+vWR5dHI8o+PLL/ZyLIQqvZCVxXn8oX5TsJnO1pPpV+RAIurUYADCJsVuRkah2Fp6fxn WlyJnAuNiypHauuAWaWlPsM4IsTqF35I8XgMAcsrUMaAbC6qA9iUGImIVbqLeW1KglseGU4/ GdWMTGBZtf3TOpHIlErIPxKKDdcPaEV4KKkfPr3xUhO1mjDdo3INkytkJksBRVsoNx5gVfYr cMpi3PwsQEG74yDV/0RI9WNI9auQun671iCVB0jlAVL5UkhlUSxM0+ZCLHYjOUDVDy1rh+pf Owhlbz9WjbB2jV5rqLT8wLEDTJvzN9j863BB5HW4JWtsUJ9sUP5l2aCT4b8CG1TVF+R1FUJ/ 7wsxS7WrD69jlgIH1EfjfQvhq7BQJCrLMJrzczwUZxaxUVP/EDev1LPM9dGYrV5Pdzjj5Wx/ V3b3/2e2fn1mKwH7P46laWC2bGT9MLJ/CBv4f4/Z4o5WfqvMkfGpjmEBqVKcQoPc16cfZc9I eHoxRxUeYpYDF3DKub/9Ebbj85mwCp79ciaMRFD5sLL/GfDre/hdx7VV4ddP4ecYEoKv/3Hw fTZnViEg/zScWb/CmfXaWIxOtDyfyaMtLiZqaMkXZoNxJo+h3m6zsb1UM9pN3PqbWZy8rqtn PEX1NnugGngb41G4Xd3DEvZmierUv4RGHoncope8qncrefFAj3Q4He3obb7Rm9t07fC6FLPy xKyAN/LIy9KVUxsQ3vTjN1rKeq4yRdGgqryRSxM2mQ0aYg0FHv0hbApYRsYMDZ375u3ZoAef EXW3ERrLwGqywfPoOg4XxMEccHZ/G5M86/VqxUB1xZaXj+vacTEyZzO6afIg2djO3iFKKIeV FTwHL5qdB/sTb01OwOt1uw5ACbx2AK+VBMwp2FngQyu08FZJ9oYiT6/Hnq7jGJGNiyZvxjf1 Lc0eBv9b3L0VchLmSRIbjSNXuGgsOVJRqKeuy4jedvP+FC1AQlBjnjcI6eFwqFClKGS1Mq+c wa4wYFZm1Y/z69xkUv6CWaJ2fFpM8zc4YN8Gub12XhgpbphdhUhXZqLkCuYHRGHTTB0CbX1y 81xNPsFvhzMZD8u2UpNOnHnHI6RTeEfdeh31eB6WSSeFWZjnVlJYJLnGwmaYuszHMDwE/wVk k6BjRT5cmDOzN3F3+Zczcpm0URZUWDnz0Ex0lRCwe1b0CkrLw/kJjpNsHZW2N2nI7d1m6C5+ 7ThDtSwXSuxs+vcPo/f4dK0Pxy85GI4mo1E+nRdNFkU7SY2Ew+fSyURlFpFrEAdLNaeU1JGB 0nzh/DPm6x+EimiDMNx7O1Ool5Z05cALeFbAn20UUqO0ZDFOhyD4pI90ZH4wK9l1DWfNLbci 9j65pMjdzh7fIBmQpWtXjMI+TTRMfGssSSLzBpVFixdPc9M5GzVydvOKG/VgsDh6NY46J5xf iB2Cakd/DxyprbVb5usWo3LfcwJ1m8nLK9wmbSzLcIw7D/ojoSkiyhzCDUgdIfZNtyVn8iCf 0ncqdmNavvcimAXGqIXWx3ZN3lP57T7Mi9FxrELildqAYQTWcXoJusY8hwzypaKAECUWiya3 Ps/PC51SnUnLlC3bq9y3riyufvFcD0ZZNYHzpnL3zmc9vvXyan/27vClSIsFLxy3e8ZXf36x y/mvdrM1fw1ltGI2lg/OXk1XkjWsbpzU2k0oQwLY0XHv4aRva7aSPMv3tealXQngihsBf+7O ulDgO+FQnTpoHtRBmm3QVUV6NanKbBjMVJUsiEnAWKgQUr/ntgO9k0v6urisAZWElwKuKCNA 2DIO4ZNci/gU5w8geJk44bCSj7ZMmqq4MBrVC/7/3PJg32LQVP0ef6sSXBaX2nXvKoU9ta4k MufY2JsbZ1PmzzS/JMDHKo76eiljk3OrOkl1lHHdJJ+KQm99dhN5AsFxL0ltNjbwiF+aaJc/ oyn27FWgSp40Cvp2GhfKC6ShHaJ3aKNCZOOW4pzB87p0+vLq8LmcDS7fQVbbGBFRSF1D1tQ3 RBr46cnzRz8ePG5ro8tE1RYcJt9hbUqR1jp6W49NbJitge6DafS3+Ijww7yIfLYGh/TSr8Ru nRT59NA37EVSazpuG/CwVsMgtc14kJxnWzbVexslUhtAIoE3/vu3YYR42eW77AH/dPnKf7fd NNACSIAUvYkv4v3x4E+/e/K6/QSuK74IHmVoYPYeynt5Wsu2Lu91OpyQr/v46ZOXoX5D7f+S 6TK/eWii5Wv//mD/JfG9/aSb/XsHyqO/KBniVvyLkZ5DkCupvxe/Q/7s9r/7dxbLeagwOXwP dki/RGfG4YKgkkJ7+om/9d17eedfEmSHC/cmmuXh+469/WBD9Btkq5MuIsS8uextW8fLzOGT lt8OS3k6hDdS+9JHBnlAXob1RCDnEEkTs3GoSadRcG7+DbqRcg51LikoC6cELcylrMBYOJjL 7L/iRi4jXA/ju0x7UbwcQxS77CRfqtU1olb+39tW2I4pzuE1y4Dj3L1RZ0u7atiDMcjrQZNj RJdZfvajk3wG/kFBfF0MpUtmc2nBjTXbSm3hDEsu/ZHllqwT89grtkjKRKCYXyZhLdtcJtlA +CULZTGIka+sNuFIiSM2l3YWhl1lq8WjJyqPg0d3wpiaEUYKuYWyF7p21cQDy3ta0k/C1zZG DbktcyGgwUIOj6/atmPejSZ9JknRlWrFVE0G9+ZxLuN6i+ibomzniinMAwI1w+UMiXvb+glw 2ILA0Al4pjgWzoQuSHcyFB717qyIh8c37q74B8we85HOt5s6N2wdanu/yeYnw+OFptXZttDo XdeELtD27lv7oG9sDPFN1xVG/qRy3/U5kXvJXrgp723d2u5wzH2qX63w5yupLDFVh874Ved8 rxISYFzGOfYGoxZ+/jm7pZUOmUixfe73Blw2f5cPnpS41SUf4aIoem7WeXY0OfWMNPlma+PQ WAljS+LqnlPR0McH2TTipzX2AF6ejDNRv2GNWwBGbMsyLRhqksRXbVpYncfndrIq8HFVWDg3 13jUG9k9PWGdiBPaMTGnNnVhm+bFoh3onmyIZU22fNwo8q1p5s6ceyDDJtgM19EMPSzYCIzX YGpwVUOvx0xegUikvBI3nHX1ipIqGk131d5HetnWXrxIVzsJQP67WZhzjUFPWF0bTLQNN80t WU1vcHSF1aidjqWz1IiUv0lLRnt2r9XECDsa6deezu1u/J/A8jZTg0+R6peaTJooAtPEra9P rxHrP1cUv5Ekfq24zbsaYuk5bPpTZywnHgzcFr+piG5Z+2Pxmle6JJK6IlUhJMBnTrN7yzRO Mdx1xmyOU2blsqsaWpqZnvcYLY5O3JW7qk3vWYYzTYaoH0Y9+dtDcjVW1U8uxxZs40iI2Mt7 x8VFz+mxe4uJvNHGepidN7f4izGVckVXZ7qrQHjrD3KQaKqQkJCw5TO4Ixcmkpu5tLAZ06Fi StiutDQtaF+ArWgq3UyOj6VzT2KmDSDexsXrLtneVNdSEw3hXjRev55wXHr1Uw2+3ezNVFPs W1b9zSz7wabgR+dzMHJKAjhXeM4MlHiR3Do1WeSjcM0Oc6pOoawV6Z50EMdYFR2TG05OSSW8 siGZuypcpqn9vIq8SaLhsHzpZSBGVHoOEhMz35jJAzcyJVODUldTihoW+yz4n7FKXCH46ly/ StUx1lZrOZR98pEfLR0n7TzulJzORAK97FpgE+NxFTWF/Ko7SKqMqqzBlGD/NdZp/0a31Hn7 X/XKmYY+Ix0kutrNLDoUgM/GuE4r3EGLO/L6xSgrhV4MslUcPqvo7EYaGbUxPvT39dycgUpN X0rEBR6/QdDzcfbk1dPXrRYEO+P1XjWxsL8aZ+j07+dQrbkzssabduP4o06sXD5vYLnYpkz5 +wPC4zw6o7euOZ4N5xpOaD/llHDAYPExXL32dDKOor5u1WX7kKzIj36gX3hRYqjU1qQG1Qic 1u/+XNmJ6kDD80XKx0ydpaK6ur+O8AEFcMNEYr3+BflBDvvweXHRLh2iXSzn7afWLGv5di86 sdLf3H7ajC5f70y7LXctknu1bGDEI1+6XG/mdf0VTWUjTVhXmjmls3pQC1xcsx2YWvQj7Oq5 51WH3vs1utIvZcWy4cZGUN4c8WIWz2n/ODw6eaS0E7mo021EXes5Oj48+INX7qCFh6k17+WV YIFw7ZPx00n5rt1GFHhHE8d7Gy6qVYyAAUMUm4RdfLVA7g2Z/sHl0eGfwCkewNIpTDFgs6nA uezsZpdU7iK1grxfjTaFJ6zN4KUd58bgjV1L/wnAzeFo4MPN4Hrd+rCxzrWEmfLfvwAqhpvz KKbk83l0KlAYHooE60xd9UMNslnqhGAgXHa+1w/41GHhV0F2g/8nYTuOKxXVZnnTsSpM5vlw PlwgB43+6mZKCIUhC0vZhYNBckZcJqs7ida011PPwL1M1+HSf7eFc6t1mYrAK9KFFOUg2sKV YgB7bh3lW7IeOpyacrVh4kejIp9VZj2acmok+Z3MMViyAESSGB1GU6qIRkGD0VX3whswMocz wWBjROOzN94uF6qZn0ztQI5HeY67ry9GMakxNUFy/tbP5ItOIy+6eAK2GXKMKlESpI0/GmN+ Plqin7xo+nDhNZWjwJrJAl9Ej9Dr487UB7pvMXL/+Pyg46WVVyfCfh6dLXZdUL16ac4tl6WK +WRT7Y1KK4ne1WFZ1J9tv7nqmqg4d6Qz+fJ6prf+BfjJpxhi2rzTlDFzmpcy7XIOJgTX3Fq8 zo0+ZSEVZ6I6NNp4nkIKryLgVYjl6XWk0lpyhFIJ3YprzdPPlHieLqFQinSnFRnVeaDKTo/H Taffh8mq+1V9bldBKjyYR6FyYbxeTToPgTjnWJGo/T17f4H3F/X3R3qVXA0RLF+TIIOOhjm0 ZPpPX+/asXOOZHXS7l7m7VO+0EFc6EFzoYM/RIUeLCn0PG7p1pJCP8RjerhkTD/ELT2stCTy Y342kt0TLyQk4KO8xLlxQhWYQtnlUJTGOo2boLY7/L652fb4c6FRgLg8ALIX1Su6EeZY+bnt dU0tdPCHr58feIVGvHfrFI0DqRAyN7gPlQaeHzQ0gClW6tusnf7IsDLcVcAp5O9wSbc6yka3 UOB/OW5ZiDIFNm3R6vas7cvJ1Hu77sOXrRhoUrayuMAlK8j/27/KXg/HFXc/HjrHQ5G7Utrs LlhYLv8NF+EmahnGDMdKyHJEJonBjfB+xDKZzp5qUu57V3jorirREjUFcHyTIGq3c2qSO13N 8TUt3M28WL1GQdPLo/ExbUaAiHjpLdnzyRkcoZDrr4AoxQxnTmvLBECm5mKdCrPowCbnv9Sh oG14cTCeLq6gPlMFFSK0qB30ftDKQPrxXtvwNdxnqgdg11DMYoVyJnWCTmtWuHRFyAyKHL+9 UXFejJCfb3g0LMqjK3OJBtpaAEIf0QRHJw6DF1Ng36wsLhd6T3l0Y6XiuYgMUkBk8ctFu8Mm eOmjpraZwitqgXxBkykurv9vn5ia+saJNFgsjryn+CI+u2RCTxTjTMLH50Zdks6fOJjXMFCx prRIXKx9fIW6YtMru34osE5rayrRn3vtXu+hwCKXlbKyvYfzvx4q+6HrWXLwvqlX0reOveLn p1z9I3KfnUCRfAG0c19zlM76Q2HPZ5qh9PmPB1CCt6MNpCosH4rUzbw2pFLIAy5SySQpfFT5 SmfhPfPdNHtopbOtrjPmrVe+wc5XKgec7HB1v9PwmrPS4yhQr7g8yc/mmnuvyvhEbE1MjjCX BOeGCw/j+kxtfTHdF0cMoxm0O5HroZtwQ9LoIRN764KpGHDw+vDJ64NnfpdCWsS8laDIrhhB JX8cMs/aEEoGfS3OcEzK9tn2DVQA5ZozadK0o9YF+lADVYZUlwuqp4Xuhx5VAaodpQ6WfpYf wgEmm5P5juPtAltTOVwIHTaKfoEEfZOJKuED+SVoeAOgVq4ZLJeQteHHyFoVCWMopLJVK9Ny u0sLmQNN9RSkN/LHbbzx8+lF5VCUAsHKax2oq8jpBTp5A0P231YdpViVXfMhEtJSW+5+OfiP 4upiMhvMzax7eqGmXYrweMSPbrYGubcppAfDuVUjjOHM1/mZ1NykJd+qVqvqRba6qX4mWsSg 7W8SfKE/UriXk5N8fhKtvdfbN2s5YCWPlRzZhPVUXDor0RjMzav1FX959f3k6PDV6x//tP9j W7fM4WBy1G2tPt3Ux7a21QGXaXtK35AmlQM0mrSiOS59K/pILGcb+qx7SM5PbZIXHeKNMxHW WlXDmm9VH9sUELtZNEQzwNkQLUCOxWpNTidT3578br9hsbfSzEPlUKU51RbR6u3cbMm7Otm0 bVICwxU7tS60uu/FdE+01Xei5lXanSjRxc1+wjCwUAMYpNcIClDcuuYe8pZC5BBmyzpajhKp fW/chfrkuy5Us3tdFyF9cGh9vrx5Z2AKMFGNmsFDDazrT55nL5/uPzpYr9WHTc1XpoHtjUhd SEv7lk0wD3Koz/BTpqbtZlYOarErTsZ8+bbRSVA/PeN9X9/LUcHNqNd/zZVOtVb+tqoou9pd kd336LHlw+2grH7pZtmzg9e/P3zRzcKO+tBFVcXOpqr6xVWVie7/+MOrbhbQXhvQ3SOETq0t 1Vb0czSAsBm1vuC59F7vXt53s5Vq77ZDtKriatPY9UvUa0B7N23sv8ZZc/+Gmh6/tSKxr6ki P8QVPda6wRKRmkfLT12t+vyFTjVCS21BfYXq1fWuzxqoHFpq5fGy2vzQrdTWOifL6pwkdXTA rILDRBYTfz7YJbtIclHQ7eBDjNKO0VbUnruDy/PnitpCXWDMv2Kvxq5L+2zZse/kayx+BZtJ y4rsdpTLGK2oPmpRIRhymoais2JaREX1sbkomWjfPwmvK1ctygPYN6oX2rqychZIWejmO/7A tqa1uHsRt95UyToJlaJ+pLz8F8q77NUxTDSddSMAhyVjmT0gbcrJ2yVAsjIOrmnNAF7gQ0JI WVsJKXUIpIhgkvH0l3LVXjsUST471lfYG9OquGLIxgrNZEpSvcnAlnWW8DZ/bSyrrND8rD8/ mg2nKR+6nqV86LpKA85vQC1zKp/yg3LbTKM9VJFIvu+/Onz64vkPbS9IOHZd+eshwp5MnPlB xBnKk2TE9iocIBpwRjETUr3NShqQXq8dCwvsz2lErIzlAQRxCA+NYlmDhvnXHz6tOfXxm6AG Wjhf4PCQwwf/R2mjHEjF0KVL90pYJ+dXGr8b0oN1Je4P+gzN+nFw2XZvHRNdcUYhk00BLluz Ia3pmNZ0UGvRqNRdV4XNCvw+uEFExfUqnaS4wRT+FLRhVW4Wjxxxfbm4f2+Mu2XqtkS/jCGo ZC/Ag4+Uzie2/Ya22BQZARTeeGBr4FUBK05J5JGAIrkutVSx3DRBm2vqgY9I8VqG0Kp6C9fu Nf6YvMLeVo2kaJYXlzvPOE1Skrq9t0FaJUX+BHIRPyuD+w+iH+78CYZ1Hc7fn5RoLP8Xn8dn 05R/KJ1wiQ7UDreAYil7+kZF/7zb70BAOM8OC71N8VCvU3z6Jt/t6xe1x1HvVNCioiFaTaPu 9B66Qs5BIhpIo9ohAZyDu9P00zk8VfbJFAYy10WRBS4m1h6uv8tn/fxdAV/8oSaFo22QNL2b vVdS1UguW5GLwVZUDjMKUF3BUB2JCxfe6eMDTgQuhlJpPemCURK9UBxgynr4G9ElG3wWGTjW 1ztG25jvHdmuHsUNV33R6IrmIDXLeavRdHgEBT1uusZNYLi4xGgwL9mRsat7/FzDdeCvbnbY pUQ9oeHfcdrX0HV8UejZDDXB0HUknqXZ+3s9Rd6zGwFX9j6cFDWS75uQRXgv8B46A3hjX/Zt ZcVVkPV0uc8+tNz/qjDA+sYLgLV0iBDDJHV3kpdh3MuH3UuYkesGT0C5Uw04laj8eg9qZy6P 1WomhBVvdBguEjtDzUs26sF09MYB1R29oo7jcz2ocQMaRBOwyyx+PDhw3w0PotsJKmwLNOZz xIx9nByIEFMhB2GPV9xqSIbCoHHwPJ7MxvmikRHgkq/y/LexkDFow9r+l9V/29zZ2pr/ZbXD G30wyNXgfJYpyUsNUHBFNxxMSLjRiMg46s81U3TdStb8mglE/mVJdPEqvO3HUypJ47m4C+f+ baAmnQWDaAzm4WMyM4fitJFxfMmZ+JEZ3oqLNlBmWXiRWhdYZFrkAfw3u7sWJhpbS5AN5UHF ZWm9YxBTCNlBkwWZvN1UHFo325BhfpU1iNA5ylwaeFXPWnwJkv/rceCfRqahH6gXM5gN6/x6 OV/CsPuWqUKMwfRrkJilNCapFzEhVcLzJUUEs/PHDoXP8ikyvla1X2N9fY3yq0FT4yWMrtf6 1BVFroxpXaIoRluo1xoCE1nKDn9/sP+9LOyT1+01rcFCgsDYK6tq32qteIwNm6rT1TJMw6kX VelQBnoHj6qghB7as+qRpgLsRZgWH03ZJkX5yIJbXe5Tff2uEJAuZvUP82UfnEsa+5wV01mi CZyFHvFUry6rZPYFfFqr6i27cUGvzEyK2iInJd3Co2BkcOuGfvHYMBmkJ6vP3eYdOWOUxWx4 JLLJvsCkWwHeZCn0Gr7IYPtn3jcTrlUvHz/d/+GV7LHH+398+jr7OYte/n7/TweHPzzKfua2 8q9/ty904T9fHviR4OZw8qXZitcChj4HiiLZStu5Owfsdm98U+6F1Ug2EZ2GA+D5GGAe+fVG WBBe1oGBoHFU1TuHQ0tQKEbjx2O9svfM8dXMsBPVtDf1yuMCONjwQdbUDSV5jyxL9bdCvRrK InXVDC0t+9TYA9qKANGGRjasE54CVPFkCETCYgi6b9TB4ZonDmmx58VFKAQNcIrsjw6/L0ah AC9yM30zfO++P5tpKkw1nFwUejWYu6/MvBqv4EU2ngyGx1fOmwrz2EM8L28nbDEC+WyO7LLw OjsbTzX739lc7xIVkrsYLs5wfya+jq8yhhnSUXKWD+fwASnRCu4lzEt41VzJb1z7By8r3mmX R0NQP0RzwEJyQentkQw2XzD1RrgSc26OY5yJ900eXe3RqeuJeYfY9Yn9fJANhQ5jynrzZ2gM I+cxt9BbzTS21qK3N+lR0qAlrwSTT6bqj38TZwGcrKvWpzm19m0VcOunRmENGV8dj+X6uKBg Pa2M7IZ21Np8ug22smY76o2qXmNBbahv9sXlxtMb9dlsOr3hTD9mOtV/8Tg/bjWt11liML3R GJsMlzepeFP7JTS8lfaEywp4Xt8CS12b6qbQWsP/54yiddDGNs5rDKPXV/xc42hiuL4+Z+kn c9Ai+bhWunJKzBxR63yEtd7qwqXwUcXxPzqm/06MNUbx54nd82n3qmIcX4LXXr4xulGNhOmu Nve/nM/uflFuugKSL8A81/jjj3DNzQd66P9fiX32LDFCql68Ptj1iaTXZY+vu+TT5QCXEAWC owL2rYjBbf6XgYZkT5z79pJS8k94uvol7AlFa5mBD0y9HPl71QBH+SCbkH7Vr/RqcuYYIEPr /cdlXuY/TorR+sA54bN6SjarKeg4/lpTLi3xz2hw5sCRnLbO0JjGwEuU/4y8a2nswyMa2dMu u9laZQ7XRkZY28NF76EDtFlngrZNHVVcGXlqiN/ESrRWDuOxvf5x/9F/pO7WsVf3MEmyxfut 3UzcudKuQ5NATrr54/O4I6T28o7ebrz6IZHbtHjdmM7ufbRzvf+PBzwT0gFSNYhvBXBosHIK FFfT+WIjveOS1A4Yqmz/CiYajKrODd6L97pkEWgRVG8Z4Ksu8X/dS8PANJOfu9BlEecQV7QJ 8/Ne5k0wMqy0huobx+2pBG2/azDWu03mQ0Kqeui/drO4EW67jY34lbrsBGCac0HkFb2nARNB Ady0YWxKVfHxOuJzcy7P0XdnugN9X3VrCDq4Gh3MsA6xhPF96Vp3uvFhNDzy7VWPx7H/Iq/i E7Cqjo13dKNKNmn5iyhgkw83YQKrn27M7SXf/lcxeRUOw3H9NeYuoaDdT+Dv0kX6RJaubYTQ c/4xdezGXJ3JdMbfpeXxJinsWcAKW/fPzO8Jf/b/AFBLAwQUAAAACAA6Vvss8K2lrOUGAACU EQAACwAVAHNvcnRwZXJmLnB5VVQJAAOwXkI9p15CPVV4BAD0AWQAjVdtb9s2EP6uX3FVEFRO HNV2ugwzlgEF9oIB2Qas+eYaAS1RNmuJVEkqtvehv313JCVLtjfUSGCZvDvey8PnTnEcf1Ta Qs11oXTFZMbBcmPTKPrIOVRMyGQEuAWZqnA7h1JIDuYgLdunTsayVVMyy4OcamzdWHDWyEwc x1EkqppOMQfTPlpR8fZZo11Vtb8qps2GlZ0gr+pClJ2wMlFkc3jsNtI1t/ScC52MoijnhTNY lIpZk8jRPAL8oBt/c9toCQxDMGinABlOBi8LQsJiMobpKCWnSesKfuOSa2aFXHdSBvi+5tKI Vz4Go8BucGmnBeYNn3lFKQCr8CByD60GU8y5DOgnz6zShxTg94I0vBwrNWf5AY2je2YMwsLn Bh2lVW842FmxbEu+UjXMpimKsj2YwUpgzkmskJgiZdKa2U36WWEZbT6GWOvryUMewzXIkZOz +uAT5JRqUsLYkkKS8Cr2Qnyf8drC73/9orXSR3mN4j6Faahht8NNU1rcXuiAC0Eu71FszbEo y05y4MDFhQue7VrP+p8AnDRvqjrxDoxR8VywqNOsVAYRe/mcP5Xkg51CSFaWF9wSBWqcL/9n GO0HC9NIvElbDOfcCfqElKMgH+b89FMzY6ITrVCoMVRmPdSstZAW4ozJt9ZjNsYcUUrnsRP3 9S4Nn5/Xsk0w3oM86Sf2QkKv4KOHJuHY4zJNjwotIDweppPR0E0xBFaLmiFO6DI9Bu8Wc7Ec 7Oa8/K8tUkw1f+X6DAJeI+V7y2WekODozCqtukVMPEdCKhGUXm8Ej4/gb7v2XOPXPSkVZWM2 SQgUmTA1NkemSMO6F8qVsMlTELITojkkSsputg2+PqUGeTD8sNNLIqHK1w/prKC7ntjpnZ2M xp4Z+ud15K2PPPkc1sA4zq45z13BiDYN8eYr00I1Boz4hxukeNJ7Rhpzv4FpDrObG9ErMjqA 20yvm4pLvJSegkc91dA1cmHqkh2QZMfOxtiRHClTjESqziemNQmhKyQUolKepWtFscumWmF9 YSfsxlOsKku1cwIaeURbwc3cn39DNudtL8iZZW75k1/OuckQDKTZbb3zW+x85/5kZ0yHS7hv reMF3RCY/ZW9vSw9nbTizDrncdfJf/Xy2IcPkDd1KTIskzf1GEyVJfAvDbZPWnzjF3dKYx/J mMES4UlUvagttvumLUNAQps8WWQbuIWYdGPf+jeuivHNp3f3t18f38RLD7OiIlZI4uuZgevv TUxacP1g4hu6E87oqA9IkkcwJrFArompdPEIdbxgNOSFIx9QI5vCjz+C6JaeAj10Pb7baZE/ y9Ej1+cSBJEM0O/Dv7vS/sYhYTkcHM+4QBFH2U8D2eO6Q0bUY8Gf1YXiD4kQ+1V1OJLh/SkX Tr+BDMXsG4SeFmK6HNPXbIni7nvsVy+Ecn8ayt8c7ybNh4jIkiGiEKfufg2mqGNsgiasn7B4 k/mJG3e4tHTDQX94CIPCSTqwN1zy7vbUuw/ayXumUp6ojnfk1Ct4P/SJiP1p8X4+7BSEsye4 gUTCu3fwfpjNK/hV6ZAOXnJiNkMMteJEY8hFGbbv1Wec9XCYo45KHIaX3CDgJUqdGGNISYXI BA0aGMAuPfOkYnVSsmqVM9jP4e5uj7W7hM2vQ2xSZINEtRSB8+eHNQ74Y8rZtwWSRqcOsRVG t7ibpN8tMVHykj+P/+/PM43POHFh0GprsDNsOSzukf9xEh/DxP1N3c/7JY3MOLcwQyMFyx11 9ExR7TG3qsw58khV+2CwLyhJcPjSiGxLzoxhtxHIao3hvrtUPBdM9iypdjBH5hx7tFMnqkSe l70sMTd4Qy1elT3mBmekAtPjQDM7JS2EtNu/o6Du3P+oRzph9DgKjvozVUCc8nctvH2EJtGi i/ovvqzVTLMVvh2BT7AwPTNGrCWBjUmLYCswOspXATssAmevBAVWUXBIphdqHg6/CL43g2I7 OvaThn+V7IaMP/AnteK1xpN21PWEfFVbrIcrrsm0qG0YEP6Sx+Fh3o0sJIUBYzl0e1ued6oT NANJzw0JYhnpH1/cRl7hl73V7KjicudBgZMGD+AI9Lb2b4JKp8PmeYWQLhhNyGeHuP0t8ff0 O/9MND2buGfMN02BePjrYjpfzvvoGwQconSDTSfkrDpzuJocDS2PRekfMOsf4A+xw2ThnJeJ 4uCDGIienzJbnnSf3kn3pye1OdKYkTaXLreFxifNCRg0Q3W+nGnvKYHnr2p02alP/P/RrYGH HyYPPyBF7XHi2DAcAdj5i1doR+Rdsvfburu1W7yr29ntdHRm/gpcyXPeDmu9WvUG7CjCPL28 SFbxlxd6VXj78kLBv7y89W77KxL9C1BLAwQUAAAACAA2VvssFCAPP+YGAACVEQAADgAVAHRp bXNvcnRwZXJmLnB5VVQJAAOnXkI9p15CPVV4BAD0AWQAjVdtb9s2EP6uX3FVEFROHNV2ugwz lgEF9oIB2Qas+eYaAS1RNmuJVEkqtvehv313JCVLtjfUSGCZvDvey8PnTnEcf1TaQs11oXTF ZMbBcmPTKPrIOVRMyGQEuAWZqnA7h1JIDuYgLdunTsayVVMyy4OcamzdWHDWyEwcx1EkqppO MQfTPlpR8fZZo11Vtb8qps2GlZ0gr+pClJ2wMlFkc3jsNtI1t/ScC52MoijnhTNYlIpZk8jR PAL8oBt/c9toCQxDMGinABlOBi8LQsJiMobpKCWnSesKfuOSa2aFXHdSBvi+5tKIVz4Go8Bu cGmnBeYNn3lFKQCr8CByD60GU8y5DOgnz6zShxTg94I0vBwrNWf5AY2je2YMwsLnBh2lVW84 2FmxbEu+UjXMpimKsj2YwUpgzkmskJgiZdKa2U36WWEZbT6GWOvryUMewzXIkZOz+uAT5JRq UsLYkkKS8Cr2Qnyf8drC73/9orXSR3mN4j6Faahht8NNU1rcXuiAC0Eu71FszbEoy05y4MDF hQue7VrP+p8AnDRvqjrxDoxR8VywqNOsVAYRe/mcP5Xkg51CSFaWF9wSBWqcL/9nGO0HC9NI vElbDOfcCfqElKMgH+b89FMzY6ITrVCoMVRmPdSstZAW4ozJt9ZjNsYcUUrnsRP39S4Nn5/X sk0w3oM86Sf2QkKv4KOHJuHY4zJNjwotIDweppPR0E0xBFaLmiFO6DI9Bu8Wc7Ec7Oa8/K8t Ukw1f+X6DAJeI+V7y2WekODozCqtukVMPEdCKhGUXm8Ej4/gb7v2XOPXPSkVZWM2SQgUmTA1 NkemSMO6F8qVsMlTELITojkkSsputg2+PqWVQSIMv+z0kkwo8/VDOivosid2emcno7Gnhv6B HXvrI1E+hzUwjrRrznNXMeJNQ8T5yrRQjQEj/uEGOZ70npHH3G9gmsPs5kb0qowO4DbT66bi Em+l5+BRTzW0jVyYumQHZNmxszF2LEfKFCOxqvOJaU1C6AoJhaiUp+laUeyyqVZYYNgJu/Ec q8pS7ZyARiLRVnAz9+ffkM152wxyZplb/uSXc24yRANpdlvv/BY737k/2RnT4RLuW+t4QzeE Zn9nby9LTyetOLPOedx18l+9PDbiA+RNXYoMy+RNPQZTZQn8S4P9kxbf+MWd0thIMmawRHgS VS9qi+2+acsQkNAmTxbZBm4hJt3Y9/6Nq2J88+nd/e3Xxzfx0sOsqIgWkvh6ZuD6exOTFlw/ mPiGLoUzOuoDkuQRjEkskGxiKl08Qh0vGA2J4UgI1Mmm8OOPILqlp8APXZPvdlrkz3L0yDW6 BEEkA/T78O/utL9yyFgOB8czLnDEUfbTQPa47pAR9WjwZ3Wh+EMmxIZVHY5seH9KhtNvYEMx +wahp4WYLsf0NVuiuPse+9ULodyfhvI3x7tJAyIismSIKMSpu1+DMeoYm6AR6ycs3mR+4sYd Li3ddNCfHsKkcJIObA6XvLs99e6DdvKeqZQnquMdOfUK3g99ImZ/WryfD1sF4ewJbiCR8O4d vB9m8wp+VTqkg5ecmM0QQ6040RhyUYb9e/UZhz2c5qilEofhJTcIeIlSJ8YYUlIhMkGTBgaw S888qVidlKxa5Qz2c7i722PtLmHz6xCbFNkgUS1F4AD6YY0T/phy9m2BpNGpQ2yF0S3uJul3 S0yUvOTP4//780zzM45cGLTaGuwMWw6Le+R/HMXHMHF/U/fzfkkzMw4uzNBMwXJHHT1TVHvM rSpzjjxS1T4Y7AtKEhy+NCLbkjNj2G0EslpjuO8uFc8Fkz1Lqp3MkTnHHu3UiSqR52UvS8xN 3lCLV2WPucEhqcD0ONDMTkkLIe327yioO/c/6pFOmD2OgqP+UBUQp/xdC68foUm06KL+i29r NdNsha9H4BMsTM+MEWtJYGPSItgKjI7yVcAOi8DZK0GBVRQckumFmofDL4LvzaDYjo79pOHf Jbsh4w/8Sa14rfGkHXU9IV/VFuvhimsyLWobBoS/5HF4mHcjC0lhwFgO3d6W553qBM1A0nND glhG+sc3t5FX+GVvNTuquNx5UOCkwQM4Ar2t/aug0umweV4hpAtGI/LZIW5/S/w9/c4/E03P Ju4Z801jIB7+upjOl/M++gYBhyjdYNMJOavOHK4mR0PLY1H6B8z6B/hD7DBZOOdlojj4IAai 56fMlifdp3fS/elJbY40ZqTNpcttofFJcwIGzVCdL2fae0rg+bsaXXbqE/9/dGvg4YfJww9I UXucODYMRwB2/uYV2hF5l+z9tu5u7Rbv6nZ2Ox2dmb8CV/Kct8Nar1a9ATuKME8vL5JV/OWF 3hXevrxQ8C8vb73b/opE/wJQSwECFwMUAAAACAAUnPos3wmbRCpTAADhHgEAFAANAAAAAAAB AAAApIEAAAAAT2JqZWN0cy9saXN0b2JqZWN0LmNVVAUAA7iHQT1VeAAAUEsBAhcDFAAAAAgA Olb7LPCtpazlBgAAlBEAAAsADQAAAAAAAQAAAKSBcVMAAHNvcnRwZXJmLnB5VVQFAAOwXkI9 VXgAAFBLAQIXAxQAAAAIADZW+ywUIA8/5gYAAJURAAAOAA0AAAAAAAEAAACkgZRaAAB0aW1z b3J0cGVyZi5weVVUBQADp15CPVV4AABQSwUGAAAAAAMAAwDeAAAAu2EAAAAA --------------080204010802000409070906-- From skip@pobox.com Sat Jul 27 16:32:20 2002 From: skip@pobox.com (Skip Montanaro) Date: Sat, 27 Jul 2002 10:32:20 -0500 Subject: [Python-Dev] Sorting In-Reply-To: References: Message-ID: <15682.48388.755636.474915@12-248-11-90.client.attbi.com> Tim> Skip's Pentium III acts most like my Pentium III, which shouldn't Tim> be surprising. ... Tim> sf userid ~sort speedup under timsort (negative means slower) Tim> --------- --------------------------------------------------- Tim> montanaro -23% Tim> tim_one - 6% Tim> jacobs99 +18% Tim> lemburg +25% Tim> nascheme +30% I should point out that my PIII is in a laptop. I don't know if it's a so-called mobile Pentium or not. /proc/cpuinfo reports: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 1 cpu MHz : 451.030 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 897.84 It also has separate 16KB L1 I and D caches. From what I was able to glean from a quick glance at a Katmai vs. Coppermine article, the Coppermine's L2 cache is full-speed, on-chip, with a 256-bit wide connection and 8-way set associative cache. Does any of that help explain why my results are similar to Tim's? Skip From tim.one@comcast.net Sat Jul 27 21:22:47 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 27 Jul 2002 16:22:47 -0400 Subject: [Python-Dev] Sorting In-Reply-To: Message-ID: [Tim] > ... > I also noted that msort() gets a 32% speedup on my box when sorting a > 1.33-million line snapshot of the Python-Dev archive. This is a puzzler > to account for, since you wouldn't think there's significant pre-existing > lexicographic order in a file like that. McIlroy noted similar results > from experiments on text, PostScript and C source files in his adaptive > mergesort (which is why I tried sorting Python-Dev to begin with), but > didn't offer a hypothesis. Just a note to clarify what "the puzzle" here is. msort() may or may not be faster than sort() on random input on a given box due to platform quirks, but that isn't relevant in this case. What McIlroy noted is that the total # of compares done in these cases was significantly less than log2(N!). That just can't happen (except with vanishingly small probability) if the input data is randomly ordered, and under any comparison-based sorting method. The only hypothesis I have is that, for a stable sort, all the instances of a given element are, by definition of stability, already "in sorted order". So, e.g., "\n" is a popular line in text files, and all the occurrences of "\n" are already sorted. msort can exploit that -- and seemingly does. This doesn't necessarily contradict that ~sort happens to run slower on my box under msort, because ~sort is such an extreme case. OK, if I remove all but the first occurrence of each unique line, the # of lines drops to about 600,000. The speedup msort enjoys also drops, to 6.8%. So exploiting duplicates does appear to account for the bulk of it, but not all of it. If, after removing duplicates, I run that through random.shuffle() before sorting, msort suffers an 8% slowdown(!) relative to samplesort. If I shuffle first but don't remove duplicates, msort enjoys a 10% speedup. So it's clear that msort is getting a significant advantage out of the duplicates, but it's not at all clear what else it's exploiting -- only that there is something else, and that it's significant. Now many times has someone posted an alphabetical list of Python's keywords ? From guido@python.org Sat Jul 27 22:56:30 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 27 Jul 2002 17:56:30 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 In-Reply-To: Your message of "Mon, 22 Jul 2002 15:38:16 EDT." <15676.24360.88972.449273@anthem.wooz.org> References: <15676.16356.112688.518256@anthem.wooz.org> <15676.24360.88972.449273@anthem.wooz.org> Message-ID: <200207272156.g6RLuU826463@pcp02138704pcs.reston01.va.comcast.net> > It's a bit uglier than that because since Lib/test gets magically > added to sys.path during regrtest by virtue of running "python > Lib/test/regrtest.py". Perhaps regrtest.py can specifically remove its own directory from sys.path? (Please don't just remove sys.path[0] or ''; look in sys.argv[0] and deduce from there.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 27 22:51:50 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 27 Jul 2002 17:51:50 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 In-Reply-To: Your message of "Mon, 22 Jul 2002 13:24:52 EDT." <15676.16356.112688.518256@anthem.wooz.org> References: <15676.16356.112688.518256@anthem.wooz.org> Message-ID: <200207272151.g6RLpoi26443@pcp02138704pcs.reston01.va.comcast.net> > A better fix, IMO, is to recognize that the `test' package has become > a full fledged standard lib package (a Good Thing, IMO), heed our own > admonitions not to do relative imports, and change the various places > in the test suite that "import test_support" (or equiv) to "import > test.test_support" (or equiv). Good idea. > I've twiddled the test suite to do things this way, and all the > (expected Linux) tests pass, so I'd like to commit these changes. You've done this by now, right? Fine. > Unit test writers need to remember to use test.test_support instead of > just test_support. We could do something wacky like remove '' from > sys.path if we really cared about enforcing this. It would also be > good for folks on other systems to make sure I haven't missed a > module. Perhaps it would be a good idea for test_support (and perhaps some other crucial testing support modules) to add something at the top like this? if __name__ != "test.test_support": raise ImportError, "test_support must be imported from the test package" --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Jul 27 23:17:39 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 27 Jul 2002 18:17:39 -0400 Subject: [Python-Dev] More Sorting In-Reply-To: Your message of "Mon, 22 Jul 2002 23:19:32 EDT." <3D3CCB44.4F2592ED@metaslash.com> References: <3D3CCB44.4F2592ED@metaslash.com> Message-ID: <200207272217.g6RMHdA00500@pcp02138704pcs.reston01.va.comcast.net> > Sebastien Keim posted a patch (http://python.org/sf/544113) > of a merge sort. I didn't really review it, but it included > test and doc. So if the bisect module is being added to, > perhaps someone should review this patch. It doesn't strike me as a "fundamental" algorithm like bisection or heap sort. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sun Jul 28 06:48:26 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 28 Jul 2002 01:48:26 -0400 Subject: [Python-Dev] RE: companies data for sorting comparisons In-Reply-To: Message-ID: Kevin Altis kindly forwarded a 1.5MB XML database with about 6600 company records. A record looks like this after running his script to turn them into Python dicts: {'Address': '395 Page Mill Road\nPalo Alto, CA 94306', 'Company': 'Agilent Technologies Inc.', 'Exchange': 'NYSE', 'NumberOfEmployees': '41,000', 'Phone': '(650) 752-5000', 'Profile': 'http://biz.yahoo.com/p/a/a.html', 'Symbol': 'A', 'Web': 'http://www.agilent.com'} It appears to me that the XML file is maintained by hand, in order of ticker symbol. But people make mistakes when alphabetizing by hand, and there are 37 indices i such that data[i]['Symbol'] > data[i+1]['Symbol'] So it's "almost sorted" by that measure, with a few dozen glitches -- and msort should be able to exploit this! I think this is an important case of real-life behavior. The proper order of Yahoo profile URLs is also strongly correlated with ticker symbol, while both the company name and web address look weakly correlated, so there's hope that msort can get some benefit on those too. Here are runs sorting on all field names, building a DSU tuple list to sort via values = [(x.get(fieldname), x) for x in data] Each field sort was run 5 times under sort, and under msort. So 5 times are reported for each sort, reported in milliseconds, and listed from quickest to slowest: Sorting on field 'Address' -- 6589 records via sort: 43.03 43.35 43.37 43.54 44.14 via msort: 45.15 45.16 45.25 45.26 45.30 Sorting on field 'Company' -- 6635 records via sort: 40.41 40.55 40.61 42.36 42.63 via msort: 30.68 30.80 30.87 30.99 31.10 Sorting on field 'Exchange' -- 6579 records via sort: 565.28 565.49 566.70 567.12 567.45 via msort: 573.29 573.61 574.55 575.34 576.46 Sorting on field 'NumberOfEmployees' -- 6531 records via sort: 120.15 120.24 120.26 120.31 122.58 via msort: 134.25 134.29 134.50 134.74 135.09 Sorting on field 'Phone' -- 6589 records via sort: 53.76 53.80 53.81 53.82 56.03 via msort: 56.05 56.10 56.19 56.21 56.86 Sorting on field 'Profile' -- 6635 records via sort: 58.66 58.71 58.84 59.02 59.50 via msort: 8.74 8.81 8.98 8.99 8.99 Sorting on field 'Symbol' -- 6635 records via sort: 39.92 40.11 40.19 40.38 40.62 via msort: 6.49 6.52 6.53 6.72 6.73 Sorting on field 'Web' -- 6632 records via sort: 47.23 47.29 47.36 47.45 47.45 via msort: 37.12 37.27 37.33 37.42 37.89 So the hopes are realized: msort gets huge benefit from the nearly-sorted Symbol field, also huge benefit from the correlated Profile field, and highly significant benefit from the weakly correlated Company and Web fields. K00L! The Exchange field sort is so bloody slow because there are few distinct Exchange values, and whenever there's a tie on those the tuple comparison routine tries to break it by comparing the dicts. Note that I warned about this kind of thing a week or two ago, in the context of trying to implement priority queues by storing and comparing (priority, object) tuples -- it can be a timing disaster if priorities are ever equal. The other fields (Phone, etc) are in essentially random order, and msort is systematically a bit slower on all of those. Note that these are all string comparisons. I don't think it's a coincidence that msort went from a major speedup on the Python-Dev task, to a significant slowdown, when I removed all duplicate lines and shuffled the corpus first. Only part of this can be accounted for by # of comparisons. On a given random input, msort() may do fewer or more comparisons than sort(), but doing many trials suggests that sort() has a small edge in # of compares on random data, on the order of 1 or 2% This is easy to believe, since msort does a few things it *knows* won't repay the cost if the order happens to be random. These are well worth it, since they're what allow msort to get huge wins when the data isn't random. But that's not enough to account for things like the >10% higher runtime in the NumberOfEmployees sort. I can't reproduce this magnitude of systematic slowdown when doing random sorts on floats or ints, so I conclude it has something to do with string compares. Unlike int and float compares, a string compare takes variable time, depending on how long the common prefix is. I'm not aware of specific research on this topic, but it's plausible to me that partitioning may be more effective than merging at reducing the number of comparisons specifically involving "nearly equal" elements. Indeed, the fastest string-sorting methods I know of move heaven and earth to avoid redundant prefix comparisons, and do so by partitioning. Turns out that doesn't account for it, though. Here are the total number of comparisons (first number on each line) done for each sort, and the sum across all string compares of the number of common prefix characters (second number on each line): Sorting on field Address' -- 6589 records via sort: 76188 132328 via msort: 76736 131081 Sorting on field 'Company' -- 6635 records via sort: 76288 113270 via msort: 56013 113270 Sorting on field 'Exchange' -- 6579 records via sort: 34851 207185 via msort: 37457 168402 Sorting on field 'NumberOfEmployees' -- 6531 records via sort: 76167 116322 via msort: 76554 112436 Sorting on field 'Phone' -- 6589 records via sort: 75972 278188 via msort: 76741 276012 Sorting on field 'Profile' -- 6635 records via sort: 76049 1922016 via msort: 8750 233452 Sorting on field 'Symbol' -- 6635 records via sort: 76073 73243 via msort: 8724 16424 Sorting on field 'Web' -- 6632 records via sort: 76207 863837 via msort: 58811 666852 Contrary to prediction, msort always got the smaller "# of equal prefix characters" total, even in the Exchange case, where it did nearly 10% more total comparisons. Python's string compare goes fastest if the first two characters aren't the same, so maybe sort() gets a systematic advantage there? Nope. Turns out msort() gets out early 17577 times on that basis when doing NumberOfEmployees, but sort() only gets out early 15984 times. I conclude that msort is at worst only a tiny bit slower when doing NumberOfEmployees, and possibly a little faster. The only measure that doesn't agree with that conclusion is time.clock() -- but time is such a subjective thing I won't let that bother me . From tim.one@comcast.net Sun Jul 28 09:07:12 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 28 Jul 2002 04:07:12 -0400 Subject: [Python-Dev] RE: companies data for sorting comparisons In-Reply-To: Message-ID: [Tim] > ... > Sorting on field 'NumberOfEmployees' -- 6531 records > via sort: 120.15 120.24 120.26 120.31 122.58 > via msort: 134.25 134.29 134.50 134.74 135.09 > ... > [where the # of comparisons done is] > Sorting on field 'NumberOfEmployees' -- 6531 records > via sort: 76167 ... > via msort: 76554 ... > ... > [and various hypotheses for why it's >10% slower anyway don't pan out] > ... > I conclude that msort is at worst only a tiny bit slower when doing > NumberOfEmployees, and possibly a little faster. The only measure that > doesn't agree with that conclusion is time.clock() -- but time is such a > subjective thing I won't let that bother me . It's the dicts again. NumberOfEmployees isn't always unique, and in particular it's missing in 6635-6531 = 104 records, so that values = [(x.get(fieldname), x) for x in data] includes 104 tuples with a None first element. Comparing a pair of those gets resolved by comparing the dicts, and dict comparison ain't cheap. Building the DSU tuple-list via values = [(x.get(fieldname), i, x) for i, x in enumerate(data)] instead leads to Sorting on field 'NumberOfEmployees' -- 6531 records via sort: 47.47 47.50 47.54 47.66 47.75 via msort: 48.21 48.23 48.43 48.81 48.85 which gives both methods a huge speed boost, and cuts .sort's speed advantage much closer to its small advantage in total # of comparisons. I expect it's just luck of the draw as to which method is going to end up comparing tuples with equal first elements more often, and msort apparently does in this case (and those comparisons are more expensive, because they have to go on to invoke int compare too). A larger lesson: even if Python gets a stable sort and advertises stability (we don't have to guarantee it even if it's there), there may *still* be strong "go fast" reasons to include an object's index in its DSU tuple. tickledly y'rs - tim From fredrik@pythonware.com Sun Jul 28 09:30:32 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 28 Jul 2002 10:30:32 +0200 Subject: [Python-Dev] RE: companies data for sorting comparisons References: Message-ID: <008b01c23611$0bc24460$ced241d5@hagrid> tim wrote: > A larger lesson: even if Python gets a stable sort and advertises stability > (we don't have to guarantee it even if it's there) if we guarantee it, all python implementors must provide one. how hard is it to implement a reasonably good stable sort from scratch? I can think of lots of really stupid ways to do it on top of existing sort code, which might be a reason to provide two different sort methods: sort (fast) and stablesort (guaranteed, but maybe not as fast as sort). in CPython, both names can map to timsort. (shouldn't you be writing a paper on this, btw? or start a sort blog ;-) From nhodgson@bigpond.net.au Sun Jul 28 14:00:06 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Sun, 28 Jul 2002 23:00:06 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020727024012.85905.qmail@web40107.mail.yahoo.com> Message-ID: <003701c23636$b25b0a80$3da48490@neil> Scott Gilbert: > First, could this be implemented by a gapped_buffer object that implements > the locking functionality you want, but that returns simple buffers to work > with when the object is locked. In other words, do we need to add this > extra functionality up in the core protocol when it can be implemented > specifically the way Scintilla (cool editor by the way) wants it to be in (Thanks) > the Scintilla specific extension. Would this mean that the explicit locking completely defines the validity of the address or is the address valid until the 'view' buffer object is garbage collected? I would like the gapped_buffer to be put back into gapped mode as soon as possible and depending on the lifetime of a view buffer object is not that robust in the face of alternate Python implementations that use non-reference-counted GC implementations (Jython / Python .Net). > Second, if you are using mutexes to do this stuff, you'll have to be very > careful about deadlock. By locking, I want to change state on the buffer from having a gap and allowing resizes to having a static size and address which will remain valid until an unlock. The lock and unlock are not treating the buffer as a mutex (I'd call the operations 'acquire' and 'release' then) although mutexes may be needed for safety in the lock and unlock implementations. It is likely that the lock and unlock would be counted (it can be locked twice and then won't be expandable until it is unlocked twice) and that exceptions would be thrown for length changing operations while locked. If you think my particular use is out of the scope of what you are trying to achieve then that is fine. Neil From neal@metaslash.com Sun Jul 28 15:03:13 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sun, 28 Jul 2002 10:03:13 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 References: <15676.16356.112688.518256@anthem.wooz.org> <200207272151.g6RLpoi26443@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D43F9A1.C4D491A3@metaslash.com> Guido van Rossum wrote: > > > A better fix, IMO, is to recognize that the `test' package has become > > a full fledged standard lib package (a Good Thing, IMO), heed our own > > admonitions not to do relative imports, and change the various places > > in the test suite that "import test_support" (or equiv) to "import > > test.test_support" (or equiv). > > Good idea. Shouldn't this also be done for from XXX import YYY? grep test_support `find Lib -name '*.py'` | \ egrep -v '(from test |test\.test_support)' | grep import Neal From guido@python.org Sun Jul 28 16:17:17 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 28 Jul 2002 11:17:17 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 In-Reply-To: Your message of "Sun, 28 Jul 2002 10:03:13 EDT." <3D43F9A1.C4D491A3@metaslash.com> References: <15676.16356.112688.518256@anthem.wooz.org> <200207272151.g6RLpoi26443@pcp02138704pcs.reston01.va.comcast.net> <3D43F9A1.C4D491A3@metaslash.com> Message-ID: <200207281517.g6SFHHS16631@pcp02138704pcs.reston01.va.comcast.net> [Barry] > > > A better fix, IMO, is to recognize that the `test' package has become > > > a full fledged standard lib package (a Good Thing, IMO), heed our own > > > admonitions not to do relative imports, and change the various places > > > in the test suite that "import test_support" (or equiv) to "import > > > test.test_support" (or equiv). [Guido] > > Good idea. [Neal] > Shouldn't this also be done for from XXX import YYY? > > grep test_support `find Lib -name '*.py'` | \ > egrep -v '(from test |test\.test_support)' | grep import Good catch! Looks like Barry hardly scratched the surface of this. I *thought* his checkin which claimed to fix this throughout Lib/test was a tad small. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Jul 28 16:23:41 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 28 Jul 2002 11:23:41 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 In-Reply-To: Your message of "Sun, 28 Jul 2002 11:17:17 EDT." <200207281517.g6SFHHS16631@pcp02138704pcs.reston01.va.comcast.net> References: <15676.16356.112688.518256@anthem.wooz.org> <200207272151.g6RLpoi26443@pcp02138704pcs.reston01.va.comcast.net> <3D43F9A1.C4D491A3@metaslash.com> <200207281517.g6SFHHS16631@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207281523.g6SFNfm16682@pcp02138704pcs.reston01.va.comcast.net> > [Neal] > > Shouldn't this also be done for from XXX import YYY? > > > > grep test_support `find Lib -name '*.py'` | \ > > egrep -v '(from test |test\.test_support)' | grep import [me] > Good catch! Looks like Barry hardly scratched the surface of this. > I *thought* his checkin which claimed to fix this throughout Lib/test > was a tad small. :-( Neal, Barry: on second thought, DON'T FIX THIS YET! I'd like to have a discussion with Barry about the motivation for this. I need to at least understand why Barry thinks he needs this, and reconcile this with my earlier position that relative imports were compulsory in this case. --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Sun Jul 28 16:49:27 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 28 Jul 2002 17:49:27 +0200 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 References: Message-ID: <010901c2364e$5ce6be10$ced241d5@hagrid> > SF patch #577031, remove PyArg_Parse() since it's deprecated > ! v = PyNumber_Float(v); > ! if (!v) > return -1; > v = PyNumber_Int(v); > ! if (!v) > return -1; umm. doesn't PyNumber_Float and PyNumber_Int convert its argument to a float/integer, if it's not already the right type? in earlier versions of Python, "%g" % "1.0" raised a TypeError. does it still do that with this patch in place? From neal@metaslash.com Sun Jul 28 17:13:12 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sun, 28 Jul 2002 12:13:12 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 References: <010901c2364e$5ce6be10$ced241d5@hagrid> Message-ID: <3D441818.83FD38F7@metaslash.com> Fredrik Lundh wrote: > > > SF patch #577031, remove PyArg_Parse() since it's deprecated > > > ! v = PyNumber_Float(v); > > ! if (!v) > > return -1; > > > v = PyNumber_Int(v); > > ! if (!v) > > return -1; > > umm. > > doesn't PyNumber_Float and PyNumber_Int convert its argument to > a float/integer, if it's not already the right type? Yes. > in earlier versions of Python, "%g" % "1.0" raised a TypeError. does > it still do that with this patch in place? No. :-( That wasn't an intentional change. The intent was to convert an int/long to a double in the case of '%g' et al and from a double to an int in the case of '%d'. What is the best way to fix this? If I call PyNumber_Check() before this code, the behaviour is the same as before. Neal From neal@metaslash.com Sun Jul 28 17:29:33 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sun, 28 Jul 2002 12:29:33 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 References: <010901c2364e$5ce6be10$ced241d5@hagrid> <3D441818.83FD38F7@metaslash.com> Message-ID: <3D441BED.43049678@metaslash.com> Neal Norwitz wrote: > > Fredrik Lundh wrote: > > > > > SF patch #577031, remove PyArg_Parse() since it's deprecated > > > > > ! v = PyNumber_Float(v); > > > ! if (!v) > > > return -1; > > > > > v = PyNumber_Int(v); > > > ! if (!v) > > > return -1; > > > > umm. > > > > doesn't PyNumber_Float and PyNumber_Int convert its argument to > > a float/integer, if it's not already the right type? > > Yes. > > > in earlier versions of Python, "%g" % "1.0" raised a TypeError. does > > it still do that with this patch in place? > > No. :-( That wasn't an intentional change. The intent was > to convert an int/long to a double in the case of '%g' et al and > from a double to an int in the case of '%d'. > > What is the best way to fix this? To answer my own question, it appears that I should use PyFloat_AsDouble() and PyInt_AsLong() and check for an error. I don't know why I didn't do this before. This restores the original behaviour. I'll check this in later. Let me know if I screwed up again. I'll also update the tests to check for the exception. Neal From guido@python.org Sun Jul 28 17:37:39 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 28 Jul 2002 12:37:39 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 In-Reply-To: Your message of "Sun, 28 Jul 2002 12:13:12 EDT." <3D441818.83FD38F7@metaslash.com> References: <010901c2364e$5ce6be10$ced241d5@hagrid> <3D441818.83FD38F7@metaslash.com> Message-ID: <200207281637.g6SGbd816840@pcp02138704pcs.reston01.va.comcast.net> > Fredrik Lundh wrote: > > > > > SF patch #577031, remove PyArg_Parse() since it's deprecated > > > > > ! v = PyNumber_Float(v); > > > ! if (!v) > > > return -1; > > > > > v = PyNumber_Int(v); > > > ! if (!v) > > > return -1; > > > > umm. > > > > doesn't PyNumber_Float and PyNumber_Int convert its argument to > > a float/integer, if it's not already the right type? > > Yes. > > > in earlier versions of Python, "%g" % "1.0" raised a TypeError. does > > it still do that with this patch in place? > > No. :-( That wasn't an intentional change. The intent was > to convert an int/long to a double in the case of '%g' et al and > from a double to an int in the case of '%d'. > > What is the best way to fix this? If I call PyNumber_Check() > before this code, the behaviour is the same as before. Revert the change. I don't believe PyNumber_Check() is the right thing to use here at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Jul 28 17:38:43 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 28 Jul 2002 12:38:43 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 In-Reply-To: Your message of "Sun, 28 Jul 2002 12:29:33 EDT." <3D441BED.43049678@metaslash.com> References: <010901c2364e$5ce6be10$ced241d5@hagrid> <3D441818.83FD38F7@metaslash.com> <3D441BED.43049678@metaslash.com> Message-ID: <200207281638.g6SGch016860@pcp02138704pcs.reston01.va.comcast.net> > To answer my own question, it appears that I should use > PyFloat_AsDouble() and PyInt_AsLong() and check for an error. > I don't know why I didn't do this before. This restores the > original behaviour. Good! > I'll check this in later. Let me know if I screwed up again. > > I'll also update the tests to check for the exception. Great! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sun Jul 28 18:21:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 28 Jul 2002 13:21:11 -0400 Subject: [Python-Dev] RE: companies data for sorting comparisons In-Reply-To: <008b01c23611$0bc24460$ced241d5@hagrid> Message-ID: [Tim] >> A larger lesson: even if Python gets a stable sort and >> advertises stability (we don't have to guarantee it even if >> it's there) [/F] > if we guarantee it, all python implementors must provide one. Or a middle ground, akin to CPython's semi-reluctant guarantees of refcount semantics for "timely" finalization. A great many CPython users appear quite happy to rely on this despite that the language doesn't guarantee it. > how hard is it to implement a reasonably good stable sort from > scratch? A straightforward mergesort using a temp vector of size N is dead easy, and reasonably good (O(N log N) worst case). There aren't any other major N log N sorts that are naturally stable, nor even any I know of (and I know of a lot ) that can be made stable without materializing list indices (or a moral equivalent). Insertion sort is naturally stable, but is O(N**2) expected case, so is DOA. > I can think of lots of really stupid ways to do it on top of existing > sort code, which might be a reason to provide two different sort > methods: sort (fast) and stablesort (guaranteed, but maybe not > as fast as sort). in CPython, both names can map to timsort. I don't want to see two sort methods on the list object, for reasons explained before. You've always been able to *get* a stable sort in Python via materializing the list indices in a 2-tuple, as in Alex's "stable sort" DSU recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234 People overly concerned about portability can stick to that. > (shouldn't you be writing a paper on this, btw? I don't think there's anything truly new here, although the combination of gimmicks may be unique. timsort.txt is close enough to a paper anyway, but better in that it only tells you useful things; the McIlroy paper covers all the rest . > or start a sort blog ;-) That must be some sort of web thing, hence beyond my limited abilities. From tim.one@comcast.net Sun Jul 28 18:52:33 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 28 Jul 2002 13:52:33 -0400 Subject: [Python-Dev] RE: companies data for sorting comparisons In-Reply-To: Message-ID: Turns out there was one comparison per merge step I wasn't extracting maximum value from. Changing the code to suck all I can out of it doesn't make a measurable difference on sortperf results, except for a tiny improvement on ~sort on my box, but makes a difference on the Exchange case of Kevin's data. Here using values = [(x.get(fieldname), i, x) for i, x in enumerate(data)] as the list to sort, and times are again in milliseconds: Sorting on field 'Address' -- 6589 records via sort: 41.24 41.39 41.41 41.42 86.71 via msort: 42.90 43.01 43.07 43.15 43.75 Sorting on field 'Company' -- 6635 records via sort: 40.24 40.34 40.42 40.43 42.58 via msort: 30.42 30.45 30.58 30.66 30.66 Sorting on field 'Exchange' -- 6579 records via sort: 59.64 59.70 59.71 59.72 59.81 via msort: 27.06 27.11 27.19 27.29 27.54 Sorting on field 'NumberOfEmployees' -- 6531 records via sort: 47.61 47.65 47.73 47.75 47.76 via msort: 48.55 48.57 48.61 48.73 48.92 Sorting on field 'Phone' -- 6589 records via sort: 48.00 48.03 48.32 48.32 48.39 via msort: 49.60 49.64 49.68 49.79 49.85 Sorting on field 'Profile' -- 6635 records via sort: 58.63 58.70 58.80 58.85 58.92 via msort: 8.47 8.48 8.51 8.59 8.68 Sorting on field 'Symbol' -- 6635 records via sort: 39.93 40.13 40.16 40.28 41.37 via msort: 6.20 6.23 6.23 6.43 6.98 Sorting on field 'Web' -- 6632 records via sort: 46.75 46.77 46.86 46.87 47.05 via msort: 36.44 36.66 36.69 36.69 36.96 'Profile' is slower than the rest for samplesort because the strings it's comparing are Yahoo URLs with a long common prefix -- the compares just take longer in that case. I'm not sure why 'Exchange' takes so long for samplesort (it's a case with lots of duplicate primary keys, but the distribution is highly skewed, not uniform as in ~sort). In all cases now, msort is a major-to-killer win, or a small (but real) loss. I'll upload a new patch and new timsort.txt next. Then I'm taking a week off! No, I wish it were for fun . From Jack.Jansen@oratrix.com Sun Jul 28 22:03:30 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sun, 28 Jul 2002 23:03:30 +0200 Subject: [Python-Dev] python.org/switch/ In-Reply-To: <20020726223911.T70962-100000@onion.valueclick.com> Message-ID: <782E9B13-A26D-11D6-83B1-003065517236@oratrix.com> On zaterdag, juli 27, 2002, at 07:40 , Ask Bjoern Hansen wrote: > > As presented on the Perl Lightning talks here at OSCON: Switch > movies. > > You guys will dig Nathan's (nat.mov and nat.mpg). > > http://www.perl.org/tpc/2002/movies/switch/ They're all pretty good, but I think I liked David best, he actually seemed to mean what he said:-) -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From greg@cosc.canterbury.ac.nz Mon Jul 29 00:43:05 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Jul 2002 11:43:05 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> Message-ID: <200207282343.g6SNh55G016683@kuku.cosc.canterbury.ac.nz> Thomas Heller : > This PEP proposes an extension to the buffer interface called the > 'safe buffer interface'. I don't understand the need for this. The C-level buffer interface is already safe as long as you use it properly -- which means using it to fetch the pointer each time it's needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From barry@zope.com Mon Jul 29 00:51:38 2002 From: barry@zope.com (Barry A. Warsaw) Date: Sun, 28 Jul 2002 19:51:38 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/email/test test_email_codecs.py,1.1,1.2 References: <15676.16356.112688.518256@anthem.wooz.org> <15676.24360.88972.449273@anthem.wooz.org> <200207272156.g6RLuU826463@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15684.33674.169550.228083@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> It's a bit uglier than that because since Lib/test gets >> magically added to sys.path during regrtest by virtue of >> running "python Lib/test/regrtest.py". GvR> Perhaps regrtest.py can specifically remove its own directory GvR> from sys.path? (Please don't just remove sys.path[0] or ''; GvR> look in sys.argv[0] and deduce from there.) Good idea: -------------------- snip snip -------------------- mydir = os.path.dirname(sys.argv[0]) sys.path.remove(mydir) -------------------- snip snip -------------------- I also followed up to Guido privately, re: the motivation for this change. Also, Neal's right, I missed some of the relative imports of test_support and I'm ready to commit those fixes once Guido gives the go ahead. -Barry From xscottg@yahoo.com Mon Jul 29 00:57:12 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Sun, 28 Jul 2002 16:57:12 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207282343.g6SNh55G016683@kuku.cosc.canterbury.ac.nz> Message-ID: <20020728235712.41025.qmail@web40112.mail.yahoo.com> --- Greg Ewing wrote: > Thomas Heller : > > > This PEP proposes an extension to the buffer interface called the > > 'safe buffer interface'. > > I don't understand the need for this. The C-level buffer > interface is already safe as long as you use it properly -- > which means using it to fetch the pointer each time it's > needed. > This is not my PEP, but let me defend it anyway. The need for this derives from wanting to do more than one thing at a time in Python (multiple processors with multiple threas, asynchronous I/O, DMA transers, ???). One thread grabs the pointer from the "safe buffer interface" and then releases the GIL while it works on that pointer. Now another thread is free to acquire the GIL and run concurrently with the first. (The asynchronous I/O case applies even on single processor machines...) I believe you were the one to explain to me why an extension can't release the GIL while it works with the PyBufferProcs acquired pointer. This PEP tries to allow the extension to do just that. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From zhaoqiang@neusoft.com Mon Jul 29 01:15:41 2002 From: zhaoqiang@neusoft.com (zhaoq) Date: Mon, 29 Jul 2002 08:15:41 +0800 Subject: [Python-Dev] Please remove me from the mailing list References: <000a01c23453$fc4b04e0$3745fea9@ibm1499> Message-ID: <010701c23695$633acc60$4a01010a@xpprofessional> This is a multi-part message in MIME format. --Boundary_(ID_yqfil81HVg0jUZY0v5uUNg) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT Please remove me from the mailing list zhaoqiang@neusoft.com thanks ----- Original Message ----- From: Rick Farrer To: Python-Dev@python.org Sent: Friday, July 26, 2002 11:24 AM Subject: [Python-Dev] Please remove me from the mailing list Please remove me from the mailing list. rf@avisionone.com Thanks, Rick --Boundary_(ID_yqfil81HVg0jUZY0v5uUNg) Content-type: text/html; charset=iso-8859-1 Content-transfer-encoding: 7BITPlease remove me from the mailing list
thanks
--Boundary_(ID_yqfil81HVg0jUZY0v5uUNg)-- From xscottg@yahoo.com Mon Jul 29 01:29:57 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Sun, 28 Jul 2002 17:29:57 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <003701c23636$b25b0a80$3da48490@neil> Message-ID: <20020729002957.74716.qmail@web40101.mail.yahoo.com> --- Neil Hodgson wrote: > > Would this mean that the explicit locking completely defines the > validity of the address or is the address valid until the 'view' buffer > object is garbage collected? I would like the gapped_buffer to be put > back into gapped mode as soon as possible and depending on the lifetime > of a view buffer object is not that robust in the face of alternate > Python implementations that use non-reference-counted GC implementations > (Jython / Python .Net). > If you're worried about exactly when the object is released, you could add a specific release() method to your object indicating that you don't intend to use it anymore. My point was that, with Thomas Heller's safe buffer protocol (or my bytes object), you would have a pointer that could be manipulated independently of the GIL, but that putting locking semantics into your gapped_buffer is something you could add on top without complicating the core. In other words, his PEP (or mine) allows you to do something you couldn't necessarily do previously, and it doesn't sound like there is anything you want to do that you won't be able to. > > By locking, I want to change state on the buffer from having a gap and > allowing resizes to having a static size and address which will remain > valid until an unlock. The lock and unlock are not treating the buffer as > a mutex (I'd call the operations 'acquire' and 'release' then) although > mutexes may be needed for safety in the lock and unlock implementations. > It is likely that the lock and unlock would be counted (it can be locked > twice and then won't be expandable until it is unlocked twice) and that > exceptions would be thrown for length changing operations while locked. > You could easily implement the a counting (recursive) mutex as described above, and it might be the case that throwing an exception on the length changing operations keeps the dead lock from occurring. I'm still a bit confused though. When thread A locks (acquires) the buffer, and thread B tries to do a resize and it generates an exception, what is thread B supposed to do next? I assume that the resize was due to something like the user typing somewhere in the buffer. From a user interface point of view, you can't just ignore their request to insert text. Would you just try the same operation again after catching the exception? How long would you wait? > > If you think my particular use is out of the scope of what you are > trying to achieve then that is fine. > It is definitely up to Thomas Heller to decide what he wants his scope to be, and I don't want to step on his toes at all. Especially since the reason for his PEP getting written is that I didn't want to add this stuff to mine. :-) I'm just trying to point out two things: 1) With his PEP, there is a way to get the behavior you desire with out adding the complexity to the core of Python. And with recursive/counting mutexes, the behavior you want is getting more complicated. The "safe buffer protocol" is likely to cater to a wide class of users. I could be wrong, but the "lockable gapped buffer protocol" probably appeals to a much smaller set. 2) Any time you go from one lock (mutex, GIL, semaphore) to multiple locks, you can introduce deadlock states. Without my understanding your design fully, your use case sounds to me like it either has the potential for deadlock, or the potential for polling. There are ways to avoid this of course, but then everyone has to follow a more complicated set of rules (for instance build a hierarchy describing the order of locks to acquire). Since Thomas's PEP doesn't introduce any new types of locks, it sidesteps these problems. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From Rick Farrer" This is a multi-part message in MIME format. ------=_NextPart_000_0009_01C2366E.D543FBA0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable For the last time. Please remove me from your mailing = list!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ------=_NextPart_000_0009_01C2366E.D543FBA0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable----- Original Message -----
Sent: Friday, July 26, 2002 11:24 AM
Subject: [Python-Dev] Please remove me from the mailing list
Please remove me from the mailing list.
Thanks,
Rick
For the last time. Please remove me = from your=20 mailing list!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
------=_NextPart_000_0009_01C2366E.D543FBA0-- From greg@cosc.canterbury.ac.nz Mon Jul 29 03:13:23 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Jul 2002 14:13:23 +1200 (NZST) Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 In-Reply-To: <3D441818.83FD38F7@metaslash.com> Message-ID: <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> Neal Norwitz : > The intent was to convert an int/long to a double in the case of > '%g' et al and from a double to an int in the case of '%d'. Are you sure the latter part of that is a good idea? As a general principle, I don't think float->int conversions should be done automatically. What is the Python philosophy on that? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From neal@metaslash.com Mon Jul 29 03:31:39 2002 From: neal@metaslash.com (Neal Norwitz) Date: Sun, 28 Jul 2002 22:31:39 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 References: <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> Message-ID: <3D44A90B.421E97DA@metaslash.com> Greg Ewing wrote: > > Neal Norwitz : > > > The intent was to convert an int/long to a double in the case of > > '%g' et al and from a double to an int in the case of '%d'. > > Are you sure the latter part of that is a good idea? As a general > principle, I don't think float->int conversions should be done > automatically. What is the Python philosophy on that? This is consistent with versions back to 1.5.2: Python 1.5.2 (#1, Jul 5 2001, 03:02:19) [GCC 2.96 20000731 (Red Hat Linux 7.1 2 on linux-i386 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> '%d' % 1.8 '1' Neal From guido@python.org Mon Jul 29 03:40:35 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 28 Jul 2002 22:40:35 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 In-Reply-To: Your message of "Mon, 29 Jul 2002 14:13:23 +1200." <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> References: <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> Message-ID: <200207290240.g6T2eZH25272@pcp02138704pcs.reston01.va.comcast.net> > > The intent was to convert an int/long to a double in the case of > > '%g' et al and from a double to an int in the case of '%d'. > > Are you sure the latter part of that is a good idea? As a general > principle, I don't think float->int conversions should be done > automatically. What is the Python philosophy on that? I fully agree, but unfortunately, in a dark past, I was given a patch that did many good things, but as a side effect, made the PyArg_Parse* family silently truncate floats to ints. Two examples: >>> "%d" % 3.14 '3' >>> a = [] >>> a.insert(0.9, 42) >>> a [42] >>> I find the second example more aggravating than the first. This touches upon a recent discussion, where one of the suggestions was to use __index__ rather than __int__ in this case. I think that's not the right solution; perhaps instead, floats and float-like types should support __truncate__ and __round__ to convert them to ints in certain ways. (Of course then we can argue about whether to round to even, and what to do if the float is so large that its smallest unit of precision is larger than one.) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Mon Jul 29 03:47:28 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Jul 2002 14:47:28 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <20020728235712.41025.qmail@web40112.mail.yahoo.com> Message-ID: <200207290247.g6T2lSHV017233@kuku.cosc.canterbury.ac.nz> Scott Gilbert : > The need for this derives from wanting to do more than one thing at a time > in Python (multiple processors with multiple threas, asynchronous I/O, DMA > transers, ???). In any situation like that, you should be using some form of locking on the object concerned. The Python buffer interface is not the right place to deal with these issues. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Mon Jul 29 03:55:45 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 28 Jul 2002 22:55:45 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects stringobject.c,2.171,2.172 In-Reply-To: <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> Message-ID: [Neal Norwitz] > The intent was to convert an int/long to a double in the case of > '%g' et al and from a double to an int in the case of '%d'. [Greg Ewing] > Are you sure the latter part of that is a good idea? As a general > principle, I don't think float->int conversions should be done > automatically. What is the Python philosophy on that? The philosophy for format codes is looser than elsewhere, else, e.g., "%s" % object would raise TypeError whenever object was a number or list, etc. I've often used %d with floats when I want them rounded to int and don't want to bother remembering how to trick a float format into suppressing the decimal point. Unfortunately, that's not quite what %d does (it truncates). Whatever, %s is like invoking str(), %r like invoking repr(), %d like invoking long(), and %g/e/f like invoking float() (although these are variants of long() and float() that refuse string arguments -- that's the exception that makes the rule easy to remember ). From skip@pobox.com Mon Jul 29 04:07:06 2002 From: skip@pobox.com (Skip Montanaro) Date: Sun, 28 Jul 2002 22:07:06 -0500 Subject: [Python-Dev] Remove from mailing list In-Reply-To: <000c01c23698$bf2e32c0$3745fea9@ibm1499> References: <000c01c23698$bf2e32c0$3745fea9@ibm1499> Message-ID: <15684.45402.132334.108285@localhost.localdomain> Rick> For the last time. Please remove me from your mailing list! Try sending a note to python-dev-admin@python.org. Better yet, try using the interface Mailman provides for you: http://mail.python.org/mailman/listinfo/python-dev -- Skip Montanaro skip@pobox.com consulting: http://manatee.mojam.com/~skip/resume.html From aahz@pythoncraft.com Mon Jul 29 04:17:24 2002 From: aahz@pythoncraft.com (Aahz) Date: Sun, 28 Jul 2002 23:17:24 -0400 Subject: [Python-Dev] Floats as indexes In-Reply-To: <200207290240.g6T2eZH25272@pcp02138704pcs.reston01.va.comcast.net> References: <200207290213.g6T2DN2U017001@kuku.cosc.canterbury.ac.nz> <200207290240.g6T2eZH25272@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020729031724.GA20797@panix.com> On Sun, Jul 28, 2002, Guido van Rossum wrote: > > >>> "%d" % 3.14 > '3' > >>> a = [] > >>> a.insert(0.9, 42) > >>> a > [42] > >>> > > I find the second example more aggravating than the first. This > touches upon a recent discussion, where one of the suggestions was > to use __index__ rather than __int__ in this case. I think that's > not the right solution; perhaps instead, floats and float-like types > should support __truncate__ and __round__ to convert them to ints in > certain ways. (Of course then we can argue about whether to round to > even, and what to do if the float is so large that its smallest unit > of precision is larger than one.) Blech. I believe that floats and similar objects should never be implicitly converted to indexes. There are too many ways for silent errors to get propagated. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From xscottg@yahoo.com Mon Jul 29 04:23:03 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Sun, 28 Jul 2002 20:23:03 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207290247.g6T2lSHV017233@kuku.cosc.canterbury.ac.nz> Message-ID: <20020729032303.28931.qmail@web40108.mail.yahoo.com> --- Greg Ewing wrote: > > > The need for this derives from wanting to do more than one thing at a > > time in Python (multiple processors with multiple threas, asynchronous > > I/O, DMA transers, ???). > > In any situation like that, you should be using some form > of locking on the object concerned. The Python buffer > interface is not the right place to deal with these > issues. > I humbly disagree with you, and I like his proposal. His PEP is simple and the locking business could lead to a mess if everyone involved is not very careful. However, I'll let him champion his PEP. I've got my own stuff to worry about, and this is part of why I didn't want to add new protocol to the PEP I've been working on. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From martin@v.loewis.de Mon Jul 29 07:39:48 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 29 Jul 2002 08:39:48 +0200 Subject: [Python-Dev] Please remove me from the mailing list In-Reply-To: <010701c23695$633acc60$4a01010a@xpprofessional> References: <000a01c23453$fc4b04e0$3745fea9@ibm1499> <010701c23695$633acc60$4a01010a@xpprofessional> Message-ID: zhaoq writes: > Please remove me from the mailing list You have subscribed yourself by deliberate action, so you need to actively unsubscribe yourself as well. What mailing list are you talking about, anyway? Regards, Martin From ville.vainio@swisslog.com Mon Jul 29 09:10:58 2002 From: ville.vainio@swisslog.com (Ville Vainio) Date: Mon, 29 Jul 2002 11:10:58 +0300 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> Message-ID: <3D44F892.6090401@swisslog.com> Fran=E7ois Pinard wrote: >>> >>> def stripIndent( s ): >>> ... indent =3D len(s) - len(s.lstrip()) >>> ... sLines =3D s.split('\n') >>> ... resultLines =3D [ line[indent:] for line in sLines ] >>> ... return ''.join( resultLines ) >>> =20 >>> > > > =20 > >>Something like this should really be available somewhere in the standar= d >>library (string module [yeah, predeprecation, I know], string >> =20 >> >In fact, I like my doc-strings and other triple-quoted strings flushed l= eft. >So, I can see them in the code exactly as they will appear on the screen. > Enabling one to strip the indentation wouldn't hurt this practice of=20 yours one bit (nobody forces you to use it). To my eyes left-flushing=20 the blocks disrupts the natural "flow" of the code, and breaks the=20 intuitive block structure of the program. >If I used artificial margins in Python so my doc-strings appeared to be >indented more than the surrounding, and wrote my code this way, it would >appear artificially constricted on the left once printed. It's not wort= h. > Could you axplain what you mean by artificially constricted? Of course=20 only the amount of space in the left margin would be removed,=20 indentation would work exactly the same. Which one looks better: ++++++++++++++++++++++++ def usage(): if 1: print """\ You should have done this and that """.stripindent() +++++++++++++++++++++++++ def usage(): if 1: print """\ You should have done this and that """ ++++++++++++++++++++++++++ When you are scanning code, the non-stripindent version of the 3-quoted=20 string jumps at your face as a "top-level" construct, even if it is only=20 associated with the usage() function. >My opinion is that it is nice this way. Don't touch the thing! :-) > Again, the change would not influence your code or practices one bit. -- Ville From mal@lemburg.com Mon Jul 29 10:02:43 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jul 2002 11:02:43 +0200 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> <3D44F892.6090401@swisslog.com> Message-ID: <3D4504B3.3030608@lemburg.com> Ville Vainio wrote: > Which one looks better: > ++++++++++++++++++++++++ > def usage(): > if 1: > print """\ > You should have done this > and that > """.stripindent() > +++++++++++++++++++++++++ > def usage(): > if 1: > print """\ > You should have done this > and that > """ > ++++++++++++++++++++++++++ > > When you are scanning code, the non-stripindent version of the 3-quoted > string jumps at your face as a "top-level" construct, even if it is only > associated with the usage() function. I think everybody has their own way of formatting multi-line strings and/or comments. There's no one-fits-all strategy. So instead of trying to find a compromise, why don't you write up a flexible helper function for the new textwrap module ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From ville.vainio@swisslog.com Mon Jul 29 10:27:37 2002 From: ville.vainio@swisslog.com (Ville Vainio) Date: Mon, 29 Jul 2002 12:27:37 +0300 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> <3D44F892.6090401@swisslog.com> <3D4504B3.3030608@lemburg.com> Message-ID: <3D450A89.7050400@swisslog.com> M.-A. Lemburg wrote: > I think everybody has their own way of formatting multi-line > strings and/or comments. There's no one-fits-all strategy. Yep, but having a standard solution available to a one, very sensible strategy would be nice. > > So instead of trying to find a compromise, why don't you write up > a flexible helper function for the new textwrap module ? I don't think there is all that much implementation to do: inspect.getdoc() already has an implementation that seems to do the right thing, it's just that the stripping is embedded into the getdoc function, instead of having it available as a seperate function. textwrap might be a good place to put it, considering that the string module is going away - even if no actual wrapping takes place. -------------------------------------------------- def getdoc(object): """Get the documentation string for an object. All tabs are expanded to spaces. To clean up docstrings that are indented to line up with blocks of code, any whitespace than can be uniformly removed from the second line onwards is removed.""" try: doc = object.__doc__ except AttributeError: return None if not isinstance(doc, (str, unicode)): return None try: lines = string.split(string.expandtabs(doc), '\n') except UnicodeError: return None else: margin = None for line in lines[1:]: content = len(string.lstrip(line)) if not content: continue indent = len(line) - content if margin is None: margin = indent else: margin = min(margin, indent) if margin is not None: for i in range(1, len(lines)): lines[i] = lines[i][margin:] return string.join(lines, '\n') ------------------------------------------ From mal@lemburg.com Mon Jul 29 10:44:37 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jul 2002 11:44:37 +0200 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> <3D44F892.6090401@swisslog.com> <3D4504B3.3030608@lemburg.com> <3D450A89.7050400@swisslog.com> Message-ID: <3D450E85.5090806@lemburg.com> Ville Vainio wrote: > M.-A. Lemburg wrote: > >> I think everybody has their own way of formatting multi-line >> strings and/or comments. There's no one-fits-all strategy. > > > Yep, but having a standard solution available to a one, very sensible > strategy would be nice. > >> >> So instead of trying to find a compromise, why don't you write up >> a flexible helper function for the new textwrap module ? > > > I don't think there is all that much implementation to do: > inspect.getdoc() already has an implementation that seems to do the > right thing, it's just that the stripping is embedded into the getdoc > function, instead of having it available as a seperate function. > textwrap might be a good place to put it, considering that the string > module is going away - even if no actual wrapping takes place. Oh, I think it is worthwhile applying some optional wrapping for overly long doc-strings as well. But there you go again: people simply don't match up when it comes to text formatting. It's all a matter of taste and style (e.g. in the US it is very common to indent the first line of a paragraph while in most of Europe is not). How about starting with a simple textwrap.dedent() API and then moving on towards the full monty textwrap.reformat() API with tons of options ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From ville.vainio@swisslog.com Mon Jul 29 11:22:54 2002 From: ville.vainio@swisslog.com (Ville Vainio) Date: Mon, 29 Jul 2002 13:22:54 +0300 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> <3D44F892.6090401@swisslog.com> <3D4504B3.3030608@lemburg.com> <3D450A89.7050400@swisslog.com> <3D450E85.5090806@lemburg.com> Message-ID: <3D45177E.6030503@swisslog.com> M.-A. Lemburg wrote: > How about starting with a simple textwrap.dedent() API and then > moving on towards the full monty textwrap.reformat() API with tons of > options ?! Fine with me - at least w/ dedent() everyone can agree with the right behaviour (except handling of the first line?), and it would be general enough to be useful for everybody (no need for options/customization) - hence the justification for a position in the std lib. I haven't had much use for intricate wrapping/reformatting yet, but I guess I will once it hits the std lib ;-). -- Ville From rwgk@yahoo.com Mon Jul 29 14:02:00 2002 From: rwgk@yahoo.com (Ralf W. Grosse-Kunstleve) Date: Mon, 29 Jul 2002 06:02:00 -0700 (PDT) Subject: [Python-Dev] pickling of large arrays Message-ID: <20020729130200.73932.qmail@web20201.mail.yahoo.com> We are using Boost.Python to expose reference-counted C++ container types (similar to std::vector<>) to Python. E.g.: from arraytbx import shared d = shared.double(1000000) # double array with a million elements c = shared.complex_double(100) # std::complex array # and many more types, incl. several custom C++ types We need a way to pickle these arrays. Since they can easily be converted to tuples we could just define functions like: def __getstate__(self): return tuple(self) However, since the arrays are potentially huge this could incur a large overhead (e.g. a tuple of a million Python float). Next idea: def __getstate__(self): return iter(self) Unfortunately (but not unexpectedly) pickle is telling me: 'can't pickle iterator objects' Attached is a short Python script (tested with 2.2.1) with a prototype implementation of a pickle helper ("piece_meal") for large arrays. piece_meal's __getstate__ converts a block of a given size to a Python list and returns a tuple with that list and a new piece_meal instance which knows how to generate the next chunk. I.e. piece_meal instances are created recursively until the input sequence is exhausted. The corresponding __setstate__ puts the pieces back together again (uncomment the print statement to see the pieces). I am wondering if a similar mechanism could be used to enable pickling of iterators, or maybe special "pickle_iterators", which would immediately enable pickling of our large arrays or any other object that can be iterated over (e.g. Numpy arrays which are currently pickled as potentially huge strings). Has this been discussed already? Are there better ideas? Ralf import pickle class piece_meal: block_size = 4 def __init__(self, sequence, position): self.sequence = sequence self.position = position def __getstate__(self): next_position = self.position - piece_meal.block_size if (next_position <= 0): return (self.sequence[:self.position], 0) return (self.sequence[next_position:self.position], piece_meal(self.sequence, next_position)) def __setstate__(self, state): #print "piece_meal:", state if (state[1] == 0): self.sequence = state[0] else: self.sequence = state[1].sequence + state[0] class array: def __init__(self, n): self.elems = [i for i in xrange(n)] def __getstate__(self): return piece_meal(self.elems, len(self.elems)) def __setstate__(self, state): self.elems = state.sequence def exercise(): for i in xrange(11): a = array(i) print a.elems s = pickle.dumps(a) b = pickle.loads(s) print b.elems if (__name__ == "__main__"): exercise() __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From nhodgson@bigpond.net.au Mon Jul 29 14:52:40 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Mon, 29 Jul 2002 23:52:40 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> Message-ID: <00c601c23707$35819a20$3da48490@neil> Scott Gilbert: > You could easily implement the a counting (recursive) mutex as described > above, and it might be the case that throwing an exception on the length > changing operations keeps the dead lock from occurring. I'm still a bit > confused though. Not as confused as I am. I don't think deadlocks or threads are that relevant to me. The most likely situations in which I would use the buffer interface is to perform large I/O operations without copying or when performing asynchronous I/O to load or save documents while continuing to run styling or linting tasks. I think its likely that the pieces of code accessing the buffer will not be real threads, but instead be cooperating contexts within a single-threaded UI framework so using semaphores will not be possible. > 1) With his PEP, there is a way to get the behavior you desire with out > adding the complexity to the core of Python. And with recursive/counting > mutexes, the behavior you want is getting more complicated. I don't want counting mutexes. I'm not defining behaviour that needs them. > The "safe > buffer protocol" is likely to cater to a wide class of users. I could be > wrong, but the "lockable gapped buffer protocol" probably appeals to a much > smaller set. Its not that a "lockable gapped buffer protocol" is needed. It is that the problem with the old buffer was that the lifetime of the pointer is not well defined. The proposal changes that by making the lifetime of the pointer be the same as the underlying object. This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers. > 2) Any time you go from one lock (mutex, GIL, semaphore) to multiple > locks, you can introduce deadlock states. My defined behaviour was "Upon receiving a lock call, it could collapse the gap and return a stable pointer to its contents and then revert to its normal behaviour on receiving an unlock". Where is a semaphore involved? Without a semaphore (or equivalent) there can be no deadlock. Neil From guido@python.org Mon Jul 29 15:19:00 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 10:19:00 -0400 Subject: [Python-Dev] Re: Multiline string constants, include in the standard library? In-Reply-To: Your message of "Mon, 29 Jul 2002 12:27:37 +0300." <3D450A89.7050400@swisslog.com> References: <20020725194802.22949.82629.Mailman@mail.python.org> <3D40F62D.7000106@swisslog.com> <3D44F892.6090401@swisslog.com> <3D4504B3.3030608@lemburg.com> <3D450A89.7050400@swisslog.com> Message-ID: <200207291419.g6TEJ0m26497@pcp02138704pcs.reston01.va.comcast.net> > > I think everybody has their own way of formatting multi-line > > strings and/or comments. There's no one-fits-all strategy. > > Yep, but having a standard solution available to a one, very sensible > strategy would be nice. Can you move this discussion to c.l.py please? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Mon Jul 29 15:34:46 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 16:34:46 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> Message-ID: <06f301c2370d$16941060$e000a8c0@thomasnotebook> [Scott] > > The "safe > > buffer protocol" is likely to cater to a wide class of users. I could be > > wrong, but the "lockable gapped buffer protocol" probably appeals to a > much > > smaller set. > [Neil] > Its not that a "lockable gapped buffer protocol" is needed. It is that > the problem with the old buffer was that the lifetime of the pointer is not > well defined. The proposal changes that by making the lifetime of the > pointer be the same as the underlying object. That's exactly what *I* need, ... > This restricts the set of > objects that can be buffers to statically sized objects. I'd prefer that > dynamically resizable objects be able to be buffers. > ..., but I understand Neil's requirements. Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'? Thomas From guido@python.org Mon Jul 29 16:00:51 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 11:00:51 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Fri, 26 Jul 2002 16:28:50 +0200." <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> References: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> Message-ID: <200207291500.g6TF0pM26852@pcp02138704pcs.reston01.va.comcast.net> Thomas, I like your PEP. Could you clean it up (changing 'large' into 'safe' etc.) and send it to Barry? Some comments: > Backward Compatibility > > There are no backward compatibility problems. That's a simplification of the truth -- you're adding two new fields to an existing struct. But the flag bit you add makes that old and new versions of the struct can be distinguished. > It may be a good idea to expose the following convenience functions: > > int PyObject_AsSafeReadBuffer(PyObject *obj, > void **buffer, > size_t *buffer_len); > > int PyObject_AsSafeWriteBuffer(PyObject *obj, > void **buffer, > size_t *buffer_len); > > These functions return 0 on success, set buffer to the memory > location and buffer_len to the length of the memory block in > bytes. On failure, they return -1 and set an exception. Please make these a manadatory part of the proposal. Please also try to summarize the discussion so far here. My personal opinion: locking seems the wrong approach, given the danger of deadlock; Scintilla can use the existing buffer protocol, assuming its buffer doesn't move as long as you don't release the GIL and don't make calls into Scintilla. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 16:09:22 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 11:09:22 -0400 Subject: [Python-Dev] pickling of large arrays In-Reply-To: Your message of "Mon, 29 Jul 2002 06:02:00 PDT." <20020729130200.73932.qmail@web20201.mail.yahoo.com> References: <20020729130200.73932.qmail@web20201.mail.yahoo.com> Message-ID: <200207291509.g6TF9MX26908@pcp02138704pcs.reston01.va.comcast.net> > We are using Boost.Python to expose reference-counted C++ container > types (similar to std::vector<>) to Python. E.g.: > > from arraytbx import shared > d = shared.double(1000000) # double array with a million elements > c = shared.complex_double(100) # std::complex array > # and many more types, incl. several custom C++ types > > We need a way to pickle these arrays. Since they can easily be > converted to tuples we could just define functions like: > > def __getstate__(self): > return tuple(self) > > However, since the arrays are potentially huge this could incur > a large overhead (e.g. a tuple of a million Python float). > Next idea: > > def __getstate__(self): > return iter(self) > > Unfortunately (but not unexpectedly) pickle is telling me: > 'can't pickle iterator objects' > > Attached is a short Python script (tested with 2.2.1) with a prototype > implementation of a pickle helper ("piece_meal") for large arrays. That's a neat trick, unfortunately it only helps when the pickle is being written directly to disk; when it is returned as a string, you still get the entire array in memory. > piece_meal's __getstate__ converts a block of a given size to a Python > list and returns a tuple with that list and a new piece_meal instance > which knows how to generate the next chunk. I.e. piece_meal instances > are created recursively until the input sequence is exhausted. The > corresponding __setstate__ puts the pieces back together again > (uncomment the print statement to see the pieces). > > I am wondering if a similar mechanism could be used to enable pickling > of iterators, or maybe special "pickle_iterators", which would > immediately enable pickling of our large arrays or any other object > that can be iterated over (e.g. Numpy arrays which are currently > pickled as potentially huge strings). Has this been discussed already? I think pickling iterators is the wrong idea. An iterator doesn't represent data, it represents a single pass over data. Iterators may represent infinite series. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Mon Jul 29 16:51:25 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 29 Jul 2002 11:51:25 -0400 Subject: [Python-Dev] pickling of large arrays In-Reply-To: <20020729130200.73932.qmail@web20201.mail.yahoo.com> References: <20020729130200.73932.qmail@web20201.mail.yahoo.com> Message-ID: <20020729155125.GA5765@panix.com> On Mon, Jul 29, 2002, Ralf W. Grosse-Kunstleve wrote: > > We need a way to pickle these arrays. See PEP 296 and read the back discussion on python-dev in the archives. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Mon Jul 29 17:05:49 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 12:05:49 -0400 Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: Your message of "Fri, 26 Jul 2002 19:26:38 PDT." <20020727022638.86727.qmail@web40101.mail.yahoo.com> References: <20020727022638.86727.qmail@web40101.mail.yahoo.com> Message-ID: <200207291605.g6TG5o428945@pcp02138704pcs.reston01.va.comcast.net> > Even if I'm wrong about the need for this, at the very least, the > additional functionality can be added later. I really just want to push > through a simple, usable, bytes object for the time being. We can easily > add, we can't easily take away. Hi Scott, I've followed this discussion and it looks like the PEP is ready for another round of refinements based upon the discussion (e.g. to use size_t). Do you have time to do that? And then the next thing would be a prototype implementation. I like where this is going! --Guido van Rossum (home page: http://www.python.org/~guido/) From xscottg@yahoo.com Mon Jul 29 17:39:13 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 09:39:13 -0700 (PDT) Subject: [Python-Dev] PEP 296 - The Buffer Problem In-Reply-To: <200207291605.g6TG5o428945@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020729163913.46117.qmail@web40102.mail.yahoo.com> --- Guido van Rossum wrote: > > I've followed this discussion and it looks like the PEP is ready for > another round of refinements based upon the discussion (e.g. to use > size_t). Do you have time to do that? > > And then the next thing would be a prototype implementation. > > I like where this is going! > Very cool. I'm glad to hear it. I'll integrate the new changes to the text tonight and post the next version to python-dev and comp.lang.python tomorrow. Implementation is in progress, but not far enough along that I can swag a done date yet. It shouldn't take too long, but like you indicated before, I may need some help on doing the pickling correctly. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From mcherm@destiny.com Mon Jul 29 17:42:08 2002 From: mcherm@destiny.com (Michael Chermside) Date: Mon, 29 Jul 2002 12:42:08 -0400 Subject: [Python-Dev] Re: PEP 295 - Interpretation of multiline string constants Message-ID: <3D457060.4060505@destiny.com> > So... What you (and others) think about just adding flag 'i' to string > constants (that will strip indentation etc.)? This doesn't affect > existing code, but it will be useful (at least for me ;-) Motivation > was posted here by Michael Chermside, but I don't like his solutions. Please understand that the motivation I posted was an attempt to describe YOUR possible motivation for desiring the change. I wouldn't like this feature, myself. I was just trying to point out that it could all be achieved with somewhere between 1 character and 5 lines worth of code. The solution to this (so-called) "problem" simply does not belong in the language itself, despite the fact that you don't like my solutions. However, if you have a particular reason why you don't like these solutions, send me an email (don't CC the list), and I'll see if I can come up with a different solution you DO like. -- Michael Chermside From xscottg@yahoo.com Mon Jul 29 17:45:32 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 09:45:32 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <00c601c23707$35819a20$3da48490@neil> Message-ID: <20020729164532.48588.qmail@web40110.mail.yahoo.com> --- Neil Hodgson wrote: > Scott Gilbert: > > > You could easily implement the a counting (recursive) mutex as > > described above, and it might be the case that throwing an exception > > on the length changing operations keeps the dead lock from occurring. > > I'm still a bit confused though. > > Not as confused as I am. I don't think deadlocks or threads are that > relevant to me. The most likely situations in which I would use the > buffer interface is to perform large I/O operations without copying or > when performing asynchronous I/O to load or save documents while > continuing to run styling or linting tasks. I think its likely that the > pieces of code accessing the buffer will not be real threads, but instead > be cooperating contexts within a single-threaded UI framework so using > semaphores will not be possible. > What happens when you've locked the buffer and passed a pointer to the I/O system for an asynchronous operation, but before that operation has completed, your main program wants to resize the buffer due to a user generated event? I had written responses/questions to other parts of your message, but I found that I was just asking the same question above over and over, so I've chopped them out. If you can explain this to me, and there aren't any problems with deadlock or polling, then I'll quit interfering and let you and Thomas decide if you really think the locking semantics are useful to a wide enough audience that it should be included in the core. > > I don't want counting mutexes. I'm not defining behavior that needs > them. > You said you wanted the locks to keep a count. So that you could call acquire() multiple times and have the buffer not truly become unlocked until release() was called the same amount of times. I'm willing to adopt any terminology you want for the purpose of this discussion. I think I understand the semantics or the counting operation, but I want to understand more what actually happens when the buffer is locked. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From thomas.heller@ion-tof.com Mon Jul 29 17:52:17 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 18:52:17 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <082b01c234b0$c33564e0$e000a8c0@thomasnotebook> <200207291500.g6TF0pM26852@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <091701c23720$4c3c3310$e000a8c0@thomasnotebook> From: "Guido van Rossum" > Thomas, > > I like your PEP. Could you clean it up (changing 'large' into 'safe' > etc.) and send it to Barry? Some comments: Great. I have changed it to your reqeusts, and also included Greg's and Neil's points. Thomas From xscottg@yahoo.com Mon Jul 29 17:54:19 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 09:54:19 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <06f301c2370d$16941060$e000a8c0@thomasnotebook> Message-ID: <20020729165419.31643.qmail@web40111.mail.yahoo.com> --- Thomas Heller wrote: > > > This restricts the set of objects that can be buffers to statically > > sized objects. I'd prefer that dynamically resizable objects be able to > > be buffers. > > > > ..., but I understand Neil's requirements. > > Can they be fulfilled by adding some kind of UnlockObject() > call to the 'safe buffer interface', which should mean 'I won't > use the pointer received by getsaferead/writebufferproc any more'? > I assume this means any call to getsafereadpointer()/getsafewritepointer() will increment the lock count. So the UnlockObject() calls will be mandatory. Either that, or you'll have an explicit LockObject() call as well. What behavior should happen when a resise is attempted while the lock count is positive? __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From thomas.heller@ion-tof.com Mon Jul 29 18:03:30 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 19:03:30 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> Message-ID: <093b01c23721$dd908680$e000a8c0@thomasnotebook> From: "Scott Gilbert" > > --- Thomas Heller wrote: > > > > > This restricts the set of objects that can be buffers to statically > > > sized objects. I'd prefer that dynamically resizable objects be able to > > > be buffers. > > > > > > > ..., but I understand Neil's requirements. > > > > Can they be fulfilled by adding some kind of UnlockObject() > > call to the 'safe buffer interface', which should mean 'I won't > > use the pointer received by getsaferead/writebufferproc any more'? > > > > I assume this means any call to getsafereadpointer()/getsafewritepointer() > will increment the lock count. So the UnlockObject() calls will be > mandatory. Either that, or you'll have an explicit LockObject() call as > well. What behavior should happen when a resise is attempted while the > lock count is positive? This question is not difficult to answer;-) The resize should fail. That's the only possibility. If this can be handled robust enough by the object is another question. Probably this all is too complicated to be solved by the safe buffer interface, and it should be left out? Thomas From guido@python.org Mon Jul 29 18:03:55 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 13:03:55 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Mon, 29 Jul 2002 09:54:19 PDT." <20020729165419.31643.qmail@web40111.mail.yahoo.com> References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> Message-ID: <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> > --- Thomas Heller wrote: > > > > > This restricts the set of objects that can be buffers to statically > > > sized objects. I'd prefer that dynamically resizable objects be able to > > > be buffers. > > > > > > > ..., but I understand Neil's requirements. > > > > Can they be fulfilled by adding some kind of UnlockObject() > > call to the 'safe buffer interface', which should mean 'I won't > > use the pointer received by getsaferead/writebufferproc any more'? > > > > I assume this means any call to getsafereadpointer()/getsafewritepointer() > will increment the lock count. So the UnlockObject() calls will be > mandatory. Either that, or you'll have an explicit LockObject() call as > well. What behavior should happen when a resise is attempted while the > lock count is positive? I don't like where this is going. Let's not add locking to the buffer protocol. If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer. (Exactly which other API calls are safe while using the pointer is not clear; probably nothing that could possibly invoke the Python interpreter recursively, since that might release the GIL. This would generally mean that calls to Py_DECREF() are unsafe while holding on to a buffer pointer!) --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Mon Jul 29 18:08:11 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 19:08:11 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <095701c23722$84e06770$e000a8c0@thomasnotebook> From: "Guido van Rossum" > If an object's buffer isn't allocated for the object's life > when the object is created, it should not support the "safe" version > of the protocol (maybe a different name would be better), and users > should not release the GIL while using on to the pointer. 'Persistent' buffer interface? Too long? Thomas From oren-py-d@hishome.net Mon Jul 29 18:08:24 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 29 Jul 2002 20:08:24 +0300 Subject: [Python-Dev] patch: try/finally in generators Message-ID: <20020729200824.A5391@hishome.net> http://www.python.org/sf/584626 This patch removes the limitation of not allowing yield in the try part of a try/finally. The dealloc function of a generator checks if the generator is still alive and resumes it one last time from the return instruction at the end of the code, causing any try/finally blocks to be triggered. Any exceptions raised are treated just like exceptions in a __del__ finalizer (printed and ignored). Oren From guido@python.org Mon Jul 29 18:10:44 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 13:10:44 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Mon, 29 Jul 2002 19:08:11 +0200." <095701c23722$84e06770$e000a8c0@thomasnotebook> References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> <095701c23722$84e06770$e000a8c0@thomasnotebook> Message-ID: <200207291710.g6THAin30057@pcp02138704pcs.reston01.va.comcast.net> > > If an object's buffer isn't allocated for the object's life > > when the object is created, it should not support the "safe" version > > of the protocol (maybe a different name would be better), and users > > should not release the GIL while using on to the pointer. > > 'Persistent' buffer interface? Too long? No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Mon Jul 29 18:14:51 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 19:14:51 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> <095701c23722$84e06770$e000a8c0@thomasnotebook> <200207291710.g6THAin30057@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <098e01c23723$73867590$e000a8c0@thomasnotebook> > > > If an object's buffer isn't allocated for the object's life > > > when the object is created, it should not support the "safe" version > > > of the protocol (maybe a different name would be better), and users > > > should not release the GIL while using on to the pointer. > > > > 'Persistent' buffer interface? Too long? > > No, persistent typically refers to things that survive longer than a > process. Maybe 'static' buffer interface would work. > Ahem, right. Maybe Barry can change it before committing this? Thomas From guido@python.org Mon Jul 29 18:34:01 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 13:34:01 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 20:08:24 +0300." <20020729200824.A5391@hishome.net> References: <20020729200824.A5391@hishome.net> Message-ID: <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> > http://www.python.org/sf/584626 > > This patch removes the limitation of not allowing yield in the try part > of a try/finally. The dealloc function of a generator checks if the > generator is still alive and resumes it one last time from the return > instruction at the end of the code, causing any try/finally blocks to be > triggered. Any exceptions raised are treated just like exceptions in a > __del__ finalizer (printed and ignored). I'm not sure I understand what it does. The return instruction at the end of the code, if I take this literally, isn't enclosed in any try/finally blocks. So how can this have the desired effect? Have you verified that Jython can implement these semantics too? Do you *really* need this? --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Mon Jul 29 19:01:46 2002 From: DavidA@ActiveState.com (David Ascher) Date: Mon, 29 Jul 2002 11:01:46 -0700 Subject: [Python-Dev] python.org/switch/ References: <782E9B13-A26D-11D6-83B1-003065517236@oratrix.com> Message-ID: <3D45830A.6090207@ActiveState.com> Jack Jansen wrote: > They're all pretty good, but I think I liked David best, he actually > seemed to mean what he said:-) I_do_,_it's_why_you_haven't_seen_me_much_around_these_parts_recently... --david (those_who_saw_the_ad_may_understand_my_typing_oddities). From xscottg@yahoo.com Mon Jul 29 19:13:38 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 11:13:38 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <098e01c23723$73867590$e000a8c0@thomasnotebook> Message-ID: <20020729181338.59568.qmail@web40107.mail.yahoo.com> --- Thomas Heller and Guido wrote: > > > > If an object's buffer isn't allocated for the object's life > > > > when the object is created, it should not support the "safe" > > > > version of the protocol (maybe a different name would be better), > > > > and users should not release the GIL while using on to the pointer. > > > > > > 'Persistent' buffer interface? Too long? > > > > No, persistent typically refers to things that survive longer than a > > process. Maybe 'static' buffer interface would work. > > I'll just chime in with the name "Fixed" Buffer Interface. They aren't really static either, and fixed applies in at least two senses. :-) __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From guido@python.org Mon Jul 29 19:24:41 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 14:24:41 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Mon, 29 Jul 2002 11:13:38 PDT." <20020729181338.59568.qmail@web40107.mail.yahoo.com> References: <20020729181338.59568.qmail@web40107.mail.yahoo.com> Message-ID: <200207291824.g6TIOfq30468@pcp02138704pcs.reston01.va.comcast.net> > > > > 'Persistent' buffer interface? Too long? > > > > > > No, persistent typically refers to things that survive longer than a > > > process. Maybe 'static' buffer interface would work. > > I'll just chime in with the name "Fixed" Buffer Interface. They aren't > really static either, and fixed applies in at least two senses. :-) Nice! --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Mon Jul 29 19:36:56 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Mon, 29 Jul 2002 20:36:56 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729181338.59568.qmail@web40107.mail.yahoo.com> Message-ID: <0a9e01c2372e$ea80fb60$e000a8c0@thomasnotebook> From: "Scott Gilbert" > --- Thomas Heller and Guido wrote: > > > > > > If an object's buffer isn't allocated for the object's life > > > > > when the object is created, it should not support the "safe" > > > > > version of the protocol (maybe a different name would be better), > > > > > and users should not release the GIL while using on to the pointer. > > > > > > > > 'Persistent' buffer interface? Too long? > > > > > > No, persistent typically refers to things that survive longer than a > > > process. Maybe 'static' buffer interface would work. > > > > > I'll just chime in with the name "Fixed" Buffer Interface. They aren't > really static either, and fixed applies in at least two senses. :-) > Yup. I'll change it. Thanks, Thomas From barry@zope.com Mon Jul 29 19:38:06 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 29 Jul 2002 14:38:06 -0400 Subject: [Python-Dev] PEP 1, PEP Purpose and Guidelines Message-ID: <15685.35726.678832.241665@anthem.wooz.org> It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw , assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list@python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list@python.org and/or python-dev@python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: Title: Version: Last-Modified: Author: * Discussions-To: Status: Type: * Requires: Created: * Python-Version: Post-History: * Replaces: * Replaced-By: The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address@dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor . If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=python [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From oren-py-d@hishome.net Mon Jul 29 20:09:44 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 29 Jul 2002 22:09:44 +0300 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Mon, Jul 29, 2002 at 01:34:01PM -0400 References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020729220944.A6113@hishome.net> On Mon, Jul 29, 2002 at 01:34:01PM -0400, Guido van Rossum wrote: > > http://www.python.org/sf/584626 > > > > This patch removes the limitation of not allowing yield in the try part > > of a try/finally. The dealloc function of a generator checks if the > > generator is still alive and resumes it one last time from the return > > instruction at the end of the code, causing any try/finally blocks to be > > triggered. Any exceptions raised are treated just like exceptions in a > > __del__ finalizer (printed and ignored). > > I'm not sure I understand what it does. The return instruction at the > end of the code, if I take this literally, isn't enclosed in any > try/finally blocks. So how can this have the desired effect? They're on the block stack. The stack unwind does the rest. > Have you verified that Jython can implement these semantics too? I don't see why not. The trick of jumping to the end was just my way to avoid adding a flag or some magic value to signal to eval_frame that it needs to trigger the block stack unwind on ceval.c:2201. There must be many other ways to implement this. > Do you *really* need this? I'm a plumber. I make pipelines by chaining iterators and transformations. My favorite fittings are generator functions and closures so I rarely need to actually define a class. One of my generator functions needed to clean up some stuff so I naturally used a try/finally block. When the compiler complained I recalled that when I first read with excitement about generator functions there was a comment there about some arbitrary limitation of yield statements in try/finally blocks... Anyway, I ended up creating a temporary local object just so I could take advantage of its __del__ method for cleanup but I really didn't like it. After a quick look at ceval.c I realized that it would be easy to fix this by having the dealloc function simulate a return statement just after the yield that was never resumed. So I wrote a little patch to remove something that I consider a wart. Oren Teaser: coming soon on the dataflow library! transparent two-way interoperability between iterators and unix pipes! From guido@python.org Mon Jul 29 20:30:34 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 15:30:34 -0400 Subject: [Python-Dev] HAVE_CONFIG_H Message-ID: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> I see no references to HAVE_CONFIG_H in the source code (except one #undef in readline.c), yet we #define it on the command line. Is that still necessary? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 20:40:01 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 15:40:01 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 22:09:44 +0300." <20020729220944.A6113@hishome.net> References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> Message-ID: <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> > > > http://www.python.org/sf/584626 > > > > > > This patch removes the limitation of not allowing yield in the > > > try part of a try/finally. The dealloc function of a generator > > > checks if the generator is still alive and resumes it one last > > > time from the return instruction at the end of the code, causing > > > any try/finally blocks to be triggered. Any exceptions raised > > > are treated just like exceptions in a __del__ finalizer (printed > > > and ignored). > > > > I'm not sure I understand what it does. The return instruction at > > the end of the code, if I take this literally, isn't enclosed in > > any try/finally blocks. So how can this have the desired effect? > > They're on the block stack. The stack unwind does the rest. OK. Your way to find the last return statement gives me the willies though. :-( > > Have you verified that Jython can implement these semantics too? > > I don't see why not. The trick of jumping to the end was just my way > to avoid adding a flag or some magic value to signal to eval_frame > that it needs to trigger the block stack unwind on ceval.c:2201. > There must be many other ways to implement this. Please go to the Jython developers and ask their opinion. Implementing yield in Java is a bit of a hack, and we've been careful to make it possible at all. I don't want to break it. Of course, since Jython has garbage collection, your finally clause may be executed later than you had expected it, or not at all! Are you sure you want this? I don't recall all the reasons why this restriction was added to the PEP, but I believe it wasn't just because we couldn't figure out how to implement it -- it also had to do with not being able to explain what exactly the semantics would be. > > Do you *really* need this? > > I'm a plumber. I make pipelines by chaining iterators and > transformations. My favorite fittings are generator functions and > closures so I rarely need to actually define a class. One of my > generator functions needed to clean up some stuff so I naturally > used a try/finally block. When the compiler complained I recalled > that when I first read with excitement about generator functions > there was a comment there about some arbitrary limitation of yield > statements in try/finally blocks... > > Anyway, I ended up creating a temporary local object just so I could > take advantage of its __del__ method for cleanup but I really didn't > like it. After a quick look at ceval.c I realized that it would be > easy to fix this by having the dealloc function simulate a return > statement just after the yield that was never resumed. So I wrote a > little patch to remove something that I consider a wart. There are a few other places that invoke Python code in a dealloc handler (__del__ invocations in classobject.c and typeobject.c). They do a more complicated dance with the reference count. Can you check that you are doing the right thing? I'd also like to get Neil Schemenauer's review of the code, since he knows best how generators work under the covers. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Jul 29 20:59:06 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 29 Jul 2002 21:59:06 +0200 Subject: [Python-Dev] HAVE_CONFIG_H References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D459E8A.1050602@lemburg.com> Guido van Rossum wrote: > I see no references to HAVE_CONFIG_H in the source code (except one > #undef in readline.c), yet we #define it on the command line. Is that > still necessary? What about these ? ./Mac/mwerks/old/mwerks_nsgusi_config.h: -- define HAVE_CONFIG_H ./Mac/mwerks/old/mwerks_tk_config.h: -- define HAVE_CONFIG_H ./Mac/mwerks/old/mwerks_shgusi_config.h: -- define HAVE_CONFIG_H ./Modules/expat/xmlparse.c: -- #ifdef HAVE_CONFIG_H ./Modules/expat/xmltok.c: -- #ifdef HAVE_CONFIG_H ./Modules/expat/xmlrole.c: -- #ifdef HAVE_CONFIG_H -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From guido@python.org Mon Jul 29 21:06:57 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 16:06:57 -0400 Subject: [Python-Dev] HAVE_CONFIG_H In-Reply-To: Your message of "Mon, 29 Jul 2002 21:59:06 +0200." <3D459E8A.1050602@lemburg.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <3D459E8A.1050602@lemburg.com> Message-ID: <200207292006.g6TK6wq06015@pcp02138704pcs.reston01.va.comcast.net> > > I see no references to HAVE_CONFIG_H in the source code (except one > > #undef in readline.c), yet we #define it on the command line. Is that > > still necessary? > > What about these ? > ./Mac/mwerks/old/mwerks_nsgusi_config.h: > -- define HAVE_CONFIG_H > ./Mac/mwerks/old/mwerks_tk_config.h: > -- define HAVE_CONFIG_H > ./Mac/mwerks/old/mwerks_shgusi_config.h: > -- define HAVE_CONFIG_H I don't have a directory Mac/mwerks/old/. Maybe you created this yourself? > ./Modules/expat/xmlparse.c: > -- #ifdef HAVE_CONFIG_H > ./Modules/expat/xmltok.c: > -- #ifdef HAVE_CONFIG_H > ./Modules/expat/xmlrole.c: > -- #ifdef HAVE_CONFIG_H We don't pass HAVE_CONFIG_H to extension modules, only to the core (stuff built directly by the Makefile, not by setup.py). That's a good thing too, becaus these include , not "pyconfig.h". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 21:09:05 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 16:09:05 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 20:08:24 +0300." <20020729200824.A5391@hishome.net> References: <20020729200824.A5391@hishome.net> Message-ID: <200207292009.g6TK95i06131@pcp02138704pcs.reston01.va.comcast.net> > http://www.python.org/sf/584626 > > This patch removes the limitation of not allowing yield in the try part > of a try/finally. The dealloc function of a generator checks if the > generator is still alive and resumes it one last time from the return > instruction at the end of the code, causing any try/finally blocks to be > triggered. Any exceptions raised are treated just like exceptions in a > __del__ finalizer (printed and ignored). Try building Python in debug mode, and then run the test suite. I get a fatal error in test_generators (but not when that test is run in isolation): Fatal Python error: ../Python/ceval.c:2256 object at 0x40b05654 has negative ref count -1 --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Mon Jul 29 21:14:26 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 29 Jul 2002 23:14:26 +0300 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Mon, Jul 29, 2002 at 03:40:01PM -0400 References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020729231426.A7209@hishome.net> On Mon, Jul 29, 2002 at 03:40:01PM -0400, Guido van Rossum wrote: > > > I'm not sure I understand what it does. The return instruction at > > > the end of the code, if I take this literally, isn't enclosed in > > > any try/finally blocks. So how can this have the desired effect? > > > > They're on the block stack. The stack unwind does the rest. > > OK. Your way to find the last return statement gives me the willies > though. :-( Yeah, I know. I'm not too proud of it but I was looking for instant gratification... > Of course, since Jython has garbage collection, your finally clause > may be executed later than you had expected it, or not at all! Are > you sure you want this? The same question applies to the __del__ method of any local variables inside the suspended generator. I tend to rely on the reference counting semantics of CPython in much of my code and I don't feel bad about it. > There are a few other places that invoke Python code in a dealloc > handler (__del__ invocations in classobject.c and typeobject.c). They > do a more complicated dance with the reference count. Can you check > that you are doing the right thing? The __del__ method gets a reference to the object so it needs to be revived. Generators are much simpler because the generator function does not have any reference to the generator object. Oren From nas@python.ca Mon Jul 29 21:25:15 2002 From: nas@python.ca (Neil Schemenauer) Date: Mon, 29 Jul 2002 13:25:15 -0700 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Mon, Jul 29, 2002 at 03:40:01PM -0400 References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020729132515.A31926@glacier.arctrix.com> Guido van Rossum wrote: > I'd also like to get Neil Schemenauer's review of the code, since he > knows best how generators work under the covers. I'm pretty sure it can be made to work (at least for CPython). The proposed patch is not correct since it doesn't handle "finally" code that creates a new reference to the generator. Also, setting the instruction pointer to the return statement is really ugly, IMO. There could be valid code out there that does not end with LOAD_CONST+RETURN. Those are minor details though. We need to decide if we really want this. For example, what happens if 'yield' is inside the finally block? With the proposed patch: >>> def f(): ... try: ... assert 0 ... finally: ... return 1 ... >>> f() 1 >>> def g(): ... try: ... assert 0 ... finally: ... yield 1 ... >>> list(g()) Traceback (most recent call last): File " ", line 1, in ? File " ", line 3, in g AssertionError Maybe some people whould expect [1] in the second case. Neil From guido@python.org Mon Jul 29 21:21:07 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 16:21:07 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 23:14:26 +0300." <20020729231426.A7209@hishome.net> References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> <20020729231426.A7209@hishome.net> Message-ID: <200207292021.g6TKL7u06204@pcp02138704pcs.reston01.va.comcast.net> > Yeah, I know. I'm not too proud of it but I was looking for instant > gratification... The search for instant gratification probably ties a lot of the Python community together... > > Of course, since Jython has garbage collection, your finally clause > > may be executed later than you had expected it, or not at all! Are > > you sure you want this? > > The same question applies to the __del__ method of any local variables > inside the suspended generator. I tend to rely on the reference counting > semantics of CPython in much of my code and I don't feel bad about it. But __del__ is in essence asynchronous. On the other hand, try/finally is traditionally completely synchronous. Adding a case where a finally clause can execute asynchronously (or not at all, if there is a global ref or cyclical garbage keeping the generator alive) sounds like a breach of promise almost. > > There are a few other places that invoke Python code in a dealloc > > handler (__del__ invocations in classobject.c and typeobject.c). They > > do a more complicated dance with the reference count. Can you check > > that you are doing the right thing? > > The __del__ method gets a reference to the object so it needs to be > revived. Generators are much simpler because the generator function does > not have any reference to the generator object. But you still have to be careful with how you incref/decref -- see my fatal error report in debug mode. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Jul 29 21:30:36 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 16:30:36 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 13:25:15 PDT." <20020729132515.A31926@glacier.arctrix.com> References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> <20020729132515.A31926@glacier.arctrix.com> Message-ID: <200207292030.g6TKUaW06234@pcp02138704pcs.reston01.va.comcast.net> > I'm pretty sure it can be made to work (at least for CPython). The > proposed patch is not correct since it doesn't handle "finally" code > that creates a new reference to the generator. As Oren pointed out, how can you create a reference to the generator when its reference count was 0? There can't be a global referencing it, and (unlike __del__) you aren't getting a pointer to yourself. > Also, setting the instruction pointer to the return statement is > really ugly, IMO. Agreed. ;-) > There could be valid code out there that does not end with > LOAD_CONST+RETURN. The current code generator always generates that as the final instruction. But someone might add an optimizer that takes that out if it is provably unreachable... > Those are minor details though. We need to decide if we really want > this. For example, what happens if 'yield' is inside the finally block? > With the proposed patch: > > >>> def f(): > ... try: > ... assert 0 > ... finally: > ... return 1 > ... > >>> f() > 1 > >>> def g(): > ... try: > ... assert 0 > ... finally: > ... yield 1 > ... > >>> list(g()) > Traceback (most recent call last): > File " ", line 1, in ? > File " ", line 3, in g > AssertionError > > Maybe some people whould expect [1] in the second case. The latter is not new; that example has no yield in the try clause. If you'd used a for loop or next() calls, you'd have noticed the yield got executed normally, but following next() call raises AssertionError. But this example behaves strangely: >>> def f(): ... try: ... yield 1 ... assert 0 ... finally: ... yield 2 ... >>> a = f() >>> a.next() 1 >>> del a >>> What happens at the yield here?!?! If I put prints before and after it, the finally clause is entered, but not exited. Bizarre!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Mon Jul 29 21:41:12 2002 From: nas@python.ca (Neil Schemenauer) Date: Mon, 29 Jul 2002 13:41:12 -0700 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: <20020729132515.A31926@glacier.arctrix.com>; from nas@python.ca on Mon, Jul 29, 2002 at 01:25:15PM -0700 References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> <20020729132515.A31926@glacier.arctrix.com> Message-ID: <20020729134112.B31926@glacier.arctrix.com> I wrote: > The proposed patch is not correct since it doesn't handle "finally" > code that creates a new reference to the generator. It looks like that's not actually a problem since you can't get a hold of a reference to the generator. However, here's another bit of nastiness: $ cat > bad.py import sys import gc def g(): global gen self = gen try: yield 1 finally: gen = self gen = g() gen.next() del gen gc.collect() print gen $ ./python bad.py Segmentation fault (core dumped) Basically, the GC has to be taught that generators can have finalizers and it may not be safe to collect them. If we allow try/finally in generators then they can cause uncollectible garbage. It's not a show stopper but something else to take into consideration. Neil From guido@python.org Mon Jul 29 21:38:12 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 16:38:12 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 13:41:12 PDT." <20020729134112.B31926@glacier.arctrix.com> References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> <20020729132515.A31926@glacier.arctrix.com> <20020729134112.B31926@glacier.arctrix.com> Message-ID: <200207292038.g6TKcC806273@pcp02138704pcs.reston01.va.comcast.net> > Basically, the GC has to be taught that generators can have finalizers > and it may not be safe to collect them. If we allow try/finally in > generators then they can cause uncollectible garbage. It's not a show > stopper but something else to take into consideration. I leave this in your capable hands. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Jul 29 22:42:50 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 29 Jul 2002 17:42:50 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: <20020729134112.B31926@glacier.arctrix.com> Message-ID: [Neil Schemenauer] > ... > Basically, the GC has to be taught that generators can have finalizers > and it may not be safe to collect them. If we allow try/finally in > generators Note that we already allow try/finally in generators. The only prohibition is against having a yield stmt in the try clause of a try/finally construct (YINTCOATFC). > then they can cause uncollectible garbage. It's not a show > stopper but something else to take into consideration. I'm concerned about semantic clarity. A "finally" block is supposed to get executed upon leaving its associated "try" block. A yield stmt doesn't leave the try block in that sense, so there's no justification for executing the finally block unless the generator is resumed, and the try block is exited "for real" via some other means (a return, an exception, or falling off the end of the try block). We could have allowed YINTCOATFC under those rules with clarity, but it would have been a great surprise then that the finally clause may never get executed at all. Better to outlaw it than that (or, as the PEP says, that would be "too much a violation of finally's purpose to bear"). Making up new control flow out of thin air upon destructing a generator ("OK, let's pretend that the generator was actually resumed in that case, and also pretend that a return statement immediately followed the yield") is plainly a hack; and because it's still possible then that the finally clause may never get executed at all (because it's possible to create an uncollectible generator), it's too much a violation of finally's purpose to bear even so. When I've needed resource-cleanup in a generator, I've made the generator a method of a class, and put the resources in instance variables. Then they're easy to clean up at will (even via a __del__ method, if need be; but the uncertainty about when and whether __del__ methods get called is already well-known, and I don't want to extend that fuzziness to 'finally' clauses too -- we left those reliable against anything short of a system crash, and IMO it's important to keep them that bulletproof). From guido@python.org Mon Jul 29 23:01:03 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 29 Jul 2002 18:01:03 -0400 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Your message of "Mon, 29 Jul 2002 17:42:50 EDT." References: Message-ID: <200207292201.g6TM14G06652@pcp02138704pcs.reston01.va.comcast.net> [Tim] > Note that we already allow try/finally in generators. The only > prohibition is against having a yield stmt in the try clause of a > try/finally construct (YINTCOATFC). > [Neil] > > then they can cause uncollectible garbage. It's not a show > > stopper but something else to take into consideration. > > I'm concerned about semantic clarity. A "finally" block is supposed > to get executed upon leaving its associated "try" block. A yield > stmt doesn't leave the try block in that sense, so there's no > justification for executing the finally block unless the generator > is resumed, and the try block is exited "for real" via some other > means (a return, an exception, or falling off the end of the try > block). We could have allowed YINTCOATFC under those rules with > clarity, but it would have been a great surprise then that the > finally clause may never get executed at all. Better to outlaw it > than that (or, as the PEP says, that would be "too much a violation > of finally's purpose to bear"). > > Making up new control flow out of thin air upon destructing a > generator ("OK, let's pretend that the generator was actually > resumed in that case, and also pretend that a return statement > immediately followed the yield") is plainly a hack; and because it's > still possible then that the finally clause may never get executed > at all (because it's possible to create an uncollectible generator), > it's too much a violation of finally's purpose to bear even so. > > When I've needed resource-cleanup in a generator, I've made the > generator a method of a class, and put the resources in instance > variables. Then they're easy to clean up at will (even via a > __del__ method, if need be; but the uncertainty about when and > whether __del__ methods get called is already well-known, and I > don't want to extend that fuzziness to 'finally' clauses too -- we > left those reliable against anything short of a system crash, and > IMO it's important to keep them that bulletproof). I hope that Oren will withdraw his patch based upon this explanation. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Mon Jul 29 23:46:03 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Jul 2002 00:46:03 +0200 Subject: [Python-Dev] pickling of large arrays In-Reply-To: <20020729130200.73932.qmail@web20201.mail.yahoo.com> References: <20020729130200.73932.qmail@web20201.mail.yahoo.com> Message-ID: "Ralf W. Grosse-Kunstleve" writes: > We are using Boost.Python to expose reference-counted C++ container > types (similar to std::vector<>) to Python. E.g.: > > from arraytbx import shared > d = shared.double(1000000) # double array with a million elements > c = shared.complex_double(100) # std::complex array > # and many more types, incl. several custom C++ types I recommend to implement pickling differently, e.g. by returning a byte string with the underlying memory representation. If producing a duplicate is still not acceptable, I recommend to inherit from the Pickler class. Regards, Martin From tim.one@comcast.net Mon Jul 29 23:49:01 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 29 Jul 2002 18:49:01 -0400 Subject: [Python-Dev] test_imaplib failing elsewhere? Message-ID: On Windows: > python ../lib/test/test_imaplib.py incorrect result when converting (2033, 5, 18, 3, 33, 20, 2, 138, 0) incorrect result when converting '"18-May-2033 13:33:20 +1000"' > IOW, it tries two things, and fails on both. Beefing up its if t1 <> t2: print 'incorrect result when converting', `t` by adding print ' t1 was', `t1` print ' t2 was', `t2` yields incorrect result when converting (2033, 5, 18, 3, 33, 20, 2, 138, 0) t1 was '"18-May-2033 03:33:20 -0500"' t2 was '"18-May-2033 04:33:20 -0400"' incorrect result when converting '"18-May-2033 13:33:20 +1000"' t1 was '"18-May-2033 13:33:20 +1000"' t2 was '"17-May-2033 23:33:20 -0400"' I'm not sure when it started failing, but within the last week ... OK, rev 1.3 of test_imaplib.py worked here, and rev 1.4 broke it, checked in 2-3 days ago. From pinard@iro.umontreal.ca Tue Jul 30 00:05:56 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 29 Jul 2002 19:05:56 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <15685.35726.678832.241665@anthem.wooz.org> References: <15685.35726.678832.241665@anthem.wooz.org> Message-ID: [Barry A. Warsaw] > It has been a while since I posted a copy of PEP 1 to the mailing > lists and newsgroups. Thanks for giving me this opportunity. There is a tiny detail that bothers me: > The format of the author entry should be > address@dom.ain (Random J. User) > if the email address is included, and just > Random J. User > if the address is not given. This makes me jump fifteen years behind (or so, I do not remember times), at the time of the great push so the Internet prefers: Random J. User It is more reasonable to always give the real name, optionally followed by an email, that to consider that the real name is a mere comment for the email address. Oh, I know some hackers who praise themselves as login names or dream having positronic brains :-), but most of us are humans before anything else! Could the PEP be reformulated, at least, for leaving the choice opened? -- François Pinard http://www.iro.umontreal.ca/~pinard From martin@v.loewis.de Mon Jul 29 23:52:57 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Jul 2002 00:52:57 +0200 Subject: [Python-Dev] HAVE_CONFIG_H In-Reply-To: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > I see no references to HAVE_CONFIG_H in the source code (except one > #undef in readline.c), yet we #define it on the command line. Is that > still necessary? It's autoconf tradition to use that; it would replace DEFS to either many -D options, or -DHAVE_CONFIG_H (if AC_CONFIG_HEADER appears). I don't think we need this, and it can safely be removed. Regards, Martin From martin@v.loewis.de Tue Jul 30 00:22:44 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Jul 2002 01:22:44 +0200 Subject: [Python-Dev] test_imaplib failing elsewhere? In-Reply-To: References: Message-ID: Tim Peters writes: > On Windows: > > > python ../lib/test/test_imaplib.py > incorrect result when converting (2033, 5, 18, 3, 33, 20, 2, 138, 0) > incorrect result when converting '"18-May-2033 13:33:20 +1000"' > > > > IOW, it tries two things, and fails on both. It fails on Linux and Solaris as well. Regards, Martin From pinard@iro.umontreal.ca Tue Jul 30 00:30:30 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 29 Jul 2002 19:30:30 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Martin v. Loewis] > Guido van Rossum writes: > > I see no references to HAVE_CONFIG_H in the source code (except one > > #undef in readline.c), yet we #define it on the command line. Is that > > still necessary? > It's autoconf tradition to use that; it would replace DEFS to either > many -D options, or -DHAVE_CONFIG_H (if AC_CONFIG_HEADER appears). > I don't think we need this, and it can safely be removed. The many `-D' options which appear when `AC_CONFIG_HEADER' is not used are rather inelegant, they create a lot, really a lot of clumsiness in `make' output. The idea, but you surely know it, was to regroup all auto-configured definitions into a single header file, and limit the `-D' to the sole `HAVE_CONFIG_H', or almost. While the: #if HAVE_CONFIG_H # include #endif idiom, for some widely used sources, was to cope with `AC_CONFIG_HEADER' being defined in some projects, and not in others. There is no need to include `config.h', nor to create it, if all `#define's have been already done through a litany of `-D' options. -- François Pinard http://www.iro.umontreal.ca/~pinard From nhodgson@bigpond.net.au Tue Jul 30 00:37:18 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Tue, 30 Jul 2002 09:37:18 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> Message-ID: <029801c23758$e13594b0$3da48490@neil> Thomas Heller: > ..., but I understand Neil's requirements. > > Can they be fulfilled by adding some kind of UnlockObject() > call to the 'safe buffer interface', which should mean 'I won't > use the pointer received by getsaferead/writebufferproc any more'? Yes, that is exactly what I want. Neil From nhodgson@bigpond.net.au Tue Jul 30 00:50:43 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Tue, 30 Jul 2002 09:50:43 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729164532.48588.qmail@web40110.mail.yahoo.com> Message-ID: <02a201c2375a$c1299f70$3da48490@neil> Scott Gilbert: > What happens when you've locked the buffer and passed a pointer to the I/O > system for an asynchronous operation, but before that operation has > completed, your main program wants to resize the buffer due to a user > generated event? That is up to the application or class designer. There are three reasonable responses I see: throw an exception, buffer the user event, or ignore the user event. The only thing guaranteed by providing the safe buffer interface is that the pointer will remain valid. > > I don't want counting mutexes. I'm not defining behavior that needs > > them. > > > > You said you wanted the locks to keep a count. So that you could call > acquire() multiple times and have the buffer not truly become unlocked > until release() was called the same amount of times. I'm willing to adopt > any terminology you want for the purpose of this discussion. I think I > understand the semantics or the counting operation, but I want to > understand more what actually happens when the buffer is locked. When the buffer is locked, it returns a pointer and promises that the pointer will remain valid until the buffer is unlocked. The buffer interface could be defined either to allow multiple (counted) locks or to fail further lock attempts. Counted locks would be applicable in more circumstances but require more implementation. I would prefer counted but it is not that important as a counting layer can be implemented over a single lock interface if needed. Neil From nhodgson@bigpond.net.au Tue Jul 30 01:02:53 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Tue, 30 Jul 2002 10:02:53 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> Message-ID: <02c001c2375c$74037de0$3da48490@neil> Scott Gilbert: > I assume this means any call to getsafereadpointer()/getsafewritepointer() > will increment the lock count. So the UnlockObject() calls will be > mandatory. The UnlockObject call will be needed if you do want to permit resizing (again). It will not be needed for statically sized objects, including all the types that are included in the PEP currently, or where you have an object that will no longer need to be resizable. For example: you construct a sound buffer, fill it with noise, then lock it so that a pointer to its data can be given to the asynch sound playing function. If you don't need to write to the sound buffer again, it doesn't need to be unlocked. > Either that, or you'll have an explicit LockObject() call as > well. What behavior should happen when a resise is attempted while the > lock count is positive? The most common response will be some form of failure, probably throwing an exception. Other responses, such as buffering the resize, may be sensible in particular circumstances. Neil From greg@cosc.canterbury.ac.nz Tue Jul 30 01:21:42 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Jul 2002 12:21:42 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <00c601c23707$35819a20$3da48490@neil> Message-ID: <200207300021.g6U0LgOG018189@kuku.cosc.canterbury.ac.nz> > This restricts the set of objects that can be buffers to statically > sized objects. I'd prefer that dynamically resizable objects be able > to be buffers. That's what bothers me about the proposal -- I suspect that this restriction will turn out to be too restrictive to make it useful. But maybe locking could be built into the safe-buffer protocol? Resizable objects wanting to support the safe buffer protocol would be required to maintain a lock count which is incremented on each getsafebufferptr call. There would also have to be a releasesafebufferptr call to decrement the lock count. As long as the lock count is nonzero, attempting to resize the object would raise an exception. That way, resizable objects could be used as asynchronous I/O buffers as long as you didn't try to resize them while actually doing I/O. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Jul 30 02:12:19 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Jul 2002 13:12:19 +1200 (NZST) Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) In-Reply-To: Message-ID: <200207300112.g6U1CJoO018210@kuku.cosc.canterbury.ac.nz> > but it would have been a great surprise then that the finally clause > may never get executed at all. Better to outlaw it than that (or, as > the PEP says, that would be "too much a violation of finally's purpose > to bear"). I don't think you'd really be breaking any promises. After all, if someone wrote def asdf(): try: something_that_never_returns() finally: ... they wouldn't have much ground for complaint that the finally never got executed. The case we're talking about seems much the same situation. > When I've needed resource-cleanup in a generator, I've made the generator a > method of a class, and put the resources in instance variables. Then > they're easy to clean up at will (even via a __del__ method, if need > be; I take it you usually provide a method for explicit cleanup. How about giving generator-iterators one, then, called maybe close() or abort(). The effect would be to raise an appropriate exception at the point of the yield, triggering any except or finally blocks. This method could even be added to the general iterator protocol (implementing it would be optional). It would then provide a standard name for people to use for cleanup methods in their own iterator classes. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Jul 30 02:25:44 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Jul 2002 13:25:44 +1200 (NZST) Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: Message-ID: <200207300125.g6U1PiGC018255@kuku.cosc.canterbury.ac.nz> pinard@iro.umontreal.ca: > It is more reasonable to always give the real name, optionally > followed by an email, that to consider that the real name is a mere > comment for the email address. Not necessarily -- it depends on your point of view. I've always thought of the "To:" line as an address, not a salutation. In other words, an instruction to the email system as to where to send the message, not the name of the recipient. Putting a person's name in there at all seems to me a sop to computer-illiterate wimps who go all wobbly at the knees when they see anything as esoteric-looking as an email address. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Jul 30 02:42:38 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Jul 2002 13:42:38 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207300142.g6U1gcSZ018273@kuku.cosc.canterbury.ac.nz> Guido: > I don't like where this is going. Let's not add locking to the buffer > protocol. Do you still object to it even in the form I proposed in my last message? (I.e. no separate "lock" call, locking is implicit in the getxxxbuffer calls.) It does make the protocol slightly more complicated to use (must remember to make a release call when you're finished with the pointer) but it seems like a good tradeoff to me for the flexibility gained. Note that there can't be any problems with deadlock, since no blocking is involved. Maybe "locking" is even the wrong term -- it's more a form of reference counting. > probably nothing that could possibly invoke the Python interpreter > recursively, since that might release the GIL. This would generally > mean that calls to Py_DECREF() are unsafe while holding on to a buffer > pointer! That could be fixed by incrementing the Python refcount as long as a pointer is held. That could be done even without the rest of my locking proposal. Of course, if you do that you need a matching release call, so you might as well implement the locking while you're at it. Mind you, if a release call is necessary, whoever holds the pointer must also hold a reference to the Python object, so that they can make the release call. So incrementing the Python refcount might not be necessary after all! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pinard@iro.umontreal.ca Tue Jul 30 02:46:34 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 29 Jul 2002 21:46:34 -0400 Subject: [Python-Dev] Re: Priority queue (binary heap) python code In-Reply-To: <20020721193057.A1891@arizona.localdomain> References: <20020624213318.A5740@arizona.localdomain> <200207200606.g6K66Um28510@pcp02138704pcs.reston01.va.comcast.net> <20020721193057.A1891@arizona.localdomain> Message-ID: [Guido van Rossum] > [...] I admire the compactness of his code. I believe that this would make > a good addition to the standard library, as a friend of the bisect module. > [...] The only change I would make would be to make heap[0] the lowest > value rather than the highest. I propose to call it heapq.py. [Kevin O'Connor] > Looks good to me. In case you going forward with `heapq', and glancing through my notes, I see that "Courageous" implemented a priority queue algorithm as a C extension, and discussed it on python-list on 2000-05-29. I'm not really expecting that you aim something else than a pure Python version, and I'm not pushing nor pulling for it, as I do not have an opinion. In any case, I'll keep these messages a few more days: just ask, and I'll send you a copy of what I saved at the time. P.S. - I'm quickly loosing interests in these bits of C code meant for speed, as if I ever need C speed, the wonderful Pyrex tool (from Greg Ewing) gives it to me while allowing the algorithm to be expressed in a language close to Python. I even wonder if Pyrex could not be a proper avenue for the development of some parts of the Python distribution itself. -- François Pinard http://www.iro.umontreal.ca/~pinard From pinard@iro.umontreal.ca Tue Jul 30 03:33:14 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 29 Jul 2002 22:33:14 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <200207300125.g6U1PiGC018255@kuku.cosc.canterbury.ac.nz> References: <200207300125.g6U1PiGC018255@kuku.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > pinard@iro.umontreal.ca: > > It is more reasonable to always give the real name, optionally > > followed by an email, that to consider that the real name is a mere > > comment for the email address. > Not necessarily -- it depends on your point of view. An email address may change over time, but one's name do not change often. In a lifetime of maintenance, I saw email addresses of a lot of correspondents fluctuate more or less over time. Only two or three persons asked me to correct their name after they got it legalistically modified. The contact point for a PEP is really a given human, whatever his/her email address may currently be. The modern Internet usage is to write the name first, and the email address after, between angular brackets. So, I'm suggesting that the PEP documents the popular, modern usage. > I've always thought of the "To:" line as an address, not a salutation. It is dual. The human reads the civil name, the machine reads the email address. Many MUA's have limited space for the message summaries, and they favour the civil name over the email address in the listings. -- François Pinard http://www.iro.umontreal.ca/~pinard From sholden@holdenweb.com Tue Jul 30 04:43:23 2002 From: sholden@holdenweb.com (Steve Holden) Date: Mon, 29 Jul 2002 23:43:23 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines References: <15685.35726.678832.241665@anthem.wooz.org> Message-ID: <00eb01c2377b$41dd4340$6300000a@holdenweb.com> ----- Original Message ----- From: "François Pinard" To: "Barry A. Warsaw" Cc: ; Sent: Monday, July 29, 2002 7:05 PM Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines > [Barry A. Warsaw] > > > It has been a while since I posted a copy of PEP 1 to the mailing > > lists and newsgroups. > > Thanks for giving me this opportunity. There is a tiny detail that > bothers me: > > > The format of the author entry should be > > address@dom.ain (Random J. User) > > if the email address is included, and just > > Random J. User > > if the address is not given. > > This makes me jump fifteen years behind (or so, I do not remember times), > at the time of the great push so the Internet prefers: > > Random J. User > > It is more reasonable to always give the real name, optionally followed by > an email, that to consider that the real name is a mere comment for the > email address. Oh, I know some hackers who praise themselves as login > names or dream having positronic brains :-), but most of us are humans > before anything else! > > Could the PEP be reformulated, at least, for leaving the choice opened? > Should we instead say that any acceptable RFC822 address would be an acceptable alternative for a simple name? If so you'd get naiive mail users complaining that they couldn't reach "@python.org:sholden@holdenweb.com" (for example). I don't really see why the address format has to agree with any particular other format: if you're going to use it in a program then there's no reason why you shouldn't mangle it into whatever form you (or your possibly-crippled software) requires :-) The major benefit of the present situation is that it's well-defined. I don't feel additional alternatived would be helpful here, especially when the existing format is RFC822-compliant. though-i-admit-i'm-not-up-to-speed-on-rfc2822-ly y'rs - steve ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From sholden@holdenweb.com Tue Jul 30 04:51:22 2002 From: sholden@holdenweb.com (Steve Holden) Date: Mon, 29 Jul 2002 23:51:22 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729165419.31643.qmail@web40111.mail.yahoo.com> <200207291703.g6TH3tk29997@pcp02138704pcs.reston01.va.comcast.net> <095701c23722$84e06770$e000a8c0@thomasnotebook> <200207291710.g6THAin30057@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <013e01c2377c$5f61c2f0$6300000a@holdenweb.com> ----- Original Message ----- From: "Guido van Rossum" To: "Thomas Heller" Cc: "Scott Gilbert" ; "Neil Hodgson" ; Sent: Monday, July 29, 2002 1:10 PM Subject: Re: [Python-Dev] pre-PEP: The Safe Buffer Interface > > > If an object's buffer isn't allocated for the object's life > > > when the object is created, it should not support the "safe" version > > > of the protocol (maybe a different name would be better), and users > > > should not release the GIL while using on to the pointer. > > > > 'Persistent' buffer interface? Too long? > > No, persistent typically refers to things that survive longer than a > process. Maybe 'static' buffer interface would work. > "cautious"? regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From just@letterror.com Tue Jul 30 06:55:20 2002 From: just@letterror.com (Just van Rossum) Date: Tue, 30 Jul 2002 07:55:20 +0200 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Modules posixmodule.c,2.247,2.248 In-Reply-To: Message-ID: nnorwitz@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Modules > In directory usw-pr-cvs1:/tmp/cvs-serv31715/Modules > > Modified Files: > posixmodule.c > Log Message: > Use PyArg_ParseTuple() instead of PyArg_Parse() which is deprecated > > Index: posixmodule.c > =================================================================== [ ... ] > ! else if (!PyArg_Parse(arg, "(ll)", &atime, &mtime)) { [ ... ] > ! else if (!PyArg_ParseTuple(arg, "ll", &atime, &mtime)) { [ ... ] Probably no biggie here, but I'd like to point out that there is a significant difference between the two calls: the former will allow any sequence for 'arg', but the latter insists on a tuple. For that reason I always use PyArg_Parse() to parse coordinate pairs and the like: it greatly enhanced the usability in those cases. Examples of this usage can be found in the Mac subtree. Just From xscottg@yahoo.com Tue Jul 30 07:10:16 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 23:10:16 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <02a201c2375a$c1299f70$3da48490@neil> Message-ID: <20020730061016.32588.qmail@web40103.mail.yahoo.com> --- Neil Hodgson wrote: > Scott Gilbert: > > > What happens when you've locked the buffer and passed a pointer to the > > I/O system for an asynchronous operation, but before that operation has > > completed, your main program wants to resize the buffer due to a user > > generated event? > > That is up to the application or class designer. There are three > reasonable responses I see: throw an exception, buffer the user event, or > ignore the user event. The only thing guaranteed by providing the safe > buffer interface is that the pointer will remain valid. > The guarantee about the pointer remaining valid while the acquire_count is positive is clear. I'm concerned about what the other thread (the one that wants to resize it) is going to do while the lock count is positive. You've listed three possibilities, but lets narrow it down to the strategy that you intend to use in Scintilla (a real use case). I believe all three strategies lead to something undesirable (be it polling, deadlock, a confused user, or ???), but I don't want to exhaustively scrutinize all possibilities until we come up with one good example that you intend to use (it would bore you to read them, and me to type them). So what exactly would you do in Scintilla? (Or pick another good use case if you prefer.) > > The buffer interface could be defined either to allow multiple > (counted) locks or to fail further lock attempts. Counted locks would be > applicable in more circumstances but require more implementation. I would > prefer counted but it is not that important as a counting layer can be > implemented over a single lock interface if needed. > A single lock interface can be implemented over an object without any locking. Have the lockable object return simple "fixed buffer objects" with a limited lifespan. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Tue Jul 30 07:10:26 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Mon, 29 Jul 2002 23:10:26 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207300142.g6U1gcSZ018273@kuku.cosc.canterbury.ac.nz> Message-ID: <20020730061026.33569.qmail@web40106.mail.yahoo.com> --- Greg Ewing wrote: > Guido: > > > I don't like where this is going. Let's not add locking to the buffer > > protocol. > > Do you still object to it even in the form I proposed in > my last message? (I.e. no separate "lock" call, locking > is implicit in the getxxxbuffer calls.) > > It does make the protocol slightly more complicated to > use (must remember to make a release call when you're > finished with the pointer) but it seems like a good > tradeoff to me for the flexibility gained. > I realize this wasn't addressed to me, and that I said I would butt out when you were in favor of canning the proposal altogether, but I won't let that get in the way. :-) We haven't seen a semi-thorough use case where the locking behavior is beneficial yet. While I appreciate and agree with the intent of trying to get a more flexible object, I think there is at least one of several problems buried down a little further than you and Neil are looking. I'm concerned that this is very much like the segment count features of the current PyBufferProcs. It was apparently designed for more generality, and while no one uses it, everyone has to check that the segment count is one or raise an exception. If there is no realizable benefit to the acquire/release semantics of the new interface, then this is just extra burden too. Lets find a realizable benefit before we muck up Thomas's good simple proposal with this stuff. In the current Python core, I can think of the following objects that would need a retrofit to this new interface (there may be more): string unicode mmap array The string, unicode, and mmap objects do not resize or reallocate by design. So for them the extra acquire/release requirements are burden with no benefit. The array object does resize (via the extend method among others). So lets say that an array object gets passed to an extension that locks the buffer and grabs the pointer. The extension releases the GIL so that another thread can work on the array object. Another thread comes in and wants to do a resize (via the extend method). (We don't need to introduce threads for this since the asynchronous I/O case is just the same.) If extend() is called while thread 1 has the array locked, it can: A) raise an exception or return an error B) block until the lock count returns to zero C) ??? .) .) Case A is troublesome because depending on thread scheduling/disk performance, you will or won't get the exception. So you've got a weird race condition where an operation might have been valid if it had only executed a split second later, but due to misfortune it raised an exception. I think this non-determinism is ugly at the very least. However since it's recoverable, you could try again (polling), or ignore the request completely (odd behavior). I think this is what both you and Neil are proposing, and I don't see how this is terribly useful. While I don't think B is the strategy anyone is proposing, it means you have two blocking objects in effect (the GIL and whatever the array uses to implement blocking). If we're not extremely careful, we can get deadlock here. I'm still looking for any good examples that fall into cases C and beyond. Neil offered a third example that might fit. He says that he could buffer the user event that led to the resize operation. If that is his strategy, I'd like to see it explained further. It sounds like taking the event and not processing it until the asynchronous I/O operation has completed. At which point I wonder what using asynchronous I/O achieved since the resize operation had to wait synchronously for the I/O to complete. This also sounds suspiciously like blocking the resize thread, but I won't argue that point. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From Jack.Jansen@cwi.nl Tue Jul 30 10:07:56 2002 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Tue, 30 Jul 2002 11:07:56 +0200 Subject: [Python-Dev] HAVE_CONFIG_H In-Reply-To: <3D459E8A.1050602@lemburg.com> Message-ID: On Monday, July 29, 2002, at 09:59 , M.-A. Lemburg wrote: > Guido van Rossum wrote: >> I see no references to HAVE_CONFIG_H in the source code (except one >> #undef in readline.c), yet we #define it on the command line. Is that >> still necessary? > > What about these ? > > ./Mac/mwerks/old/mwerks_nsgusi_config.h: > -- define HAVE_CONFIG_H [...] They're turds, they can go. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From mwh@python.net Tue Jul 30 10:33:31 2002 From: mwh@python.net (Michael Hudson) Date: 30 Jul 2002 10:33:31 +0100 Subject: [Python-Dev] patch: try/finally in generators In-Reply-To: Guido van Rossum's message of "Mon, 29 Jul 2002 16:30:36 -0400" References: <20020729200824.A5391@hishome.net> <200207291734.g6THY1k30119@pcp02138704pcs.reston01.va.comcast.net> <20020729220944.A6113@hishome.net> <200207291940.g6TJe1005489@pcp02138704pcs.reston01.va.comcast.net> <20020729132515.A31926@glacier.arctrix.com> <200207292030.g6TKUaW06234@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2m1y9llfv8.fsf@starship.python.net> Guido van Rossum writes: > > There could be valid code out there that does not end with > > LOAD_CONST+RETURN. > > The current code generator always generates that as the final > instruction. But someone might add an optimizer that takes that out > if it is provably unreachable... The bytecodehacks has one of them :) It would probably scream and run away if presented with a generator, but that's just a matter of bitrot. Cheers, M. -- All obscurity will buy you is time enough to contract venereal diseases. -- Tim Peters, python-dev From nhodgson@bigpond.net.au Tue Jul 30 10:48:44 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Tue, 30 Jul 2002 19:48:44 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020730061016.32588.qmail@web40103.mail.yahoo.com> Message-ID: <005d01c237ae$4b2f6670$3da48490@neil> Scott Gilbert: > You've listed three possibilities, but lets narrow it down to the strategy > that you intend to use in Scintilla (a real use case). I believe all three > strategies lead to something undesirable (be it polling, deadlock, a > confused user, or ???), but I don't want to exhaustively scrutinize all > possibilities until we come up with one good example that you intend to use > (it would bore you to read them, and me to type them). > > So what exactly would you do in Scintilla? (Or pick another good use case > if you prefer.) I'd prefer to ignore the input. Unfortunately users prefer a higher degree of friendliness :-( Since Scintilla is a component within a user interface, it shares this responsibility with the container application with the application being the main determinant. If I was writing a Windows-specific application that used Scintilla, and I wanted to use Asynchronous I/O then my preferred technique would be to change the message processing loop to leave the UI input messages in the queue until the I/O had completed. Once the I/O had completed then the message loop would change back to processing all messages which would allow the banked up input to come through. If I was feeling ambitious I may try to process some UI messages, possible detecting pressing Escape to abort a file load if it turned out the read was taking too long. > A single lock interface can be implemented over an object without any > locking. Have the lockable object return simple "fixed buffer objects" > with a limited lifespan. This returns to the possibility of indeterminate lifespan as mentioned earlier in the thread. > At which point I wonder what using asynchronous I/O achieved since the > resize operation had to wait synchronously for the I/O to complete. This > also sounds suspiciously like blocking the resize thread, but I won't argue > that point. There may be other tasks that the application can perform while waiting for the I/O to complete, such as displaying, styling or line-wrapping whatever text has already arrived (assuming that there are some facilities for discovering this) or performing similar tasks for other windows. Neil From smurf@noris.de Tue Jul 30 11:24:05 2002 From: smurf@noris.de (Matthias Urlichs) Date: Tue, 30 Jul 2002 12:24:05 +0200 Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) Message-ID: Greg: > I take it you usually provide a method for explicit cleanup. > How about giving generator-iterators one, then, called > maybe close() or abort(). The effect would be to raise > an appropriate exception at the point of the yield, > triggering any except or finally blocks. Objects already have a perfectly valid cleanup method -- "__del__". If your code is so complicated that it needs a try/yield/finally, it would make much more sense to convert the thing to an iterator object. It probably would make the code a whole lot more understandable, too. (It did happen with mine.) Stated another way: functions which yield stuff are special. If that specialness gets buried in nested try/except/finally/whatever constructs, things tend to get messy. Better make that messiness explicit by packaging the code in an object with well-defined methods. This is actually easy to do because of the existence of iterators, because this code def some_iter(foo): prepare(foo) try: for i in foo: yield something(i) finally: cleanup(foo) painlessly transmutes to this: class some_iter(object): def __init__(foo): prepare(foo) self.foo = foo self.it = foo.__iter__() def next(self): i = self.it.next() return something(i) def __del__(self): cleanup(self.foo) Personally I think the latter version is more readable because the important thing, i.e. how the next element is obtained, is clearly separated from the rest of the code (and one level dedented, compared to the first version). -- Matthias Urlichs From mwh@python.net Tue Jul 30 11:27:11 2002 From: mwh@python.net (Michael Hudson) Date: 30 Jul 2002 11:27:11 +0100 Subject: [Python-Dev] seeing off SET_LINENO Message-ID: <2mvg6xjytc.fsf@starship.python.net> I've submitted a(nother) patch to sf that removes SET_LINENO: http://www.python.org/sf/587993 It supports tracing by digging around in the c_lnotab[*] to see when execution moves onto a different line. I think it's more or less sound but any changes to the interpreter main loop are going to be subtle, so I have a few points to raise here. In no particular order: 1) this is a change I'd like to see anyway: the use of f->f_lasti in the main loop is confusing. let's just set it at the start of opcode dispatch and leave it the hell alone. there's actually what is probably a very old bug in the implementation of SET_LINENO. It does more or less this: f->f_lasti = INSTR_OFFSET(); /* call the trace function */ It should do this: f->f_lasti = INSTR_OFFSET() - 3; /* call the trace function */ The field is called f_LASTi, after all... 2) As I say in the patch, I will buy anyone a beer who can explain (without using LLTRACE or reading a lot of dis.py output) why we don't call the trace function on POP_TOP opcodes. 3) The patch changes behaviour -- for the better! You're now rather less likely to get the trace function called several times per line. 4) The patch installs a descriptor for f_lineno so that there is no incompatibility for Python code. The question is what to do with the f_lineno field in the C struct? Remove it? That would (probably) mean bumping PY_API_VERSION. Leave it in? Then its contents would usually be meaningless (keeping it up to date would rather defeat the point of this patch). 5) We've already bumped the MAGIC for 2.3a0, so we probably don't need to do that again. 6) Someone should teach dis.py how to find line breaks from the c_lnotab. I can do this, but not right now.... 7) The changes tickle what may be a very old bug in freeze: http://www.python.org/sf/588452 8) I haven't measured the performance impact of the changes to code that is tracing or code that isn't. There's a possible optimization mentioned in the patch for traced code. For not traced code it MAY be worthwhile putting the tracing support code in a static function somewhere so there's less code to jump over in the main loop (for i-caches and such). 9) This patch stops LLTRACE telling you when execution moves onto a different line. This could be restored, but a) I expect I'm the only persion to have used LLTRACE recently (debugging this patch). b) This will cause obfuscation, so I'd prefer to do it last. Comments welcome! Cheers, M. [*] I've cheated with my sigmonster: -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From barry@python.org Tue Jul 30 13:14:05 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 30 Jul 2002 08:14:05 -0400 Subject: [Python-Dev] seeing off SET_LINENO References: <2mvg6xjytc.fsf@starship.python.net> Message-ID: <15686.33549.262832.740505@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> 3) The patch changes behaviour -- for the better! You're now MH> rather less likely to get the trace function called several MH> times per line. Does this change affect debugging? Have you tested how this change might interact with e.g. hotshot? -Barry From neal@metaslash.com Tue Jul 30 13:19:20 2002 From: neal@metaslash.com (Neal Norwitz) Date: Tue, 30 Jul 2002 08:19:20 -0400 Subject: [Python-Dev] PyArg_ParseTuple vs. PyArg_Parse References: Message-ID: <3D468448.45C22891@metaslash.com> Just van Rossum wrote: > > nnorwitz@users.sourceforge.net wrote: > > > Use PyArg_ParseTuple() instead of PyArg_Parse() which is deprecated > > > > Index: posixmodule.c > > =================================================================== > [ ... ] > > ! else if (!PyArg_Parse(arg, "(ll)", &atime, &mtime)) { > [ ... ] > > ! else if (!PyArg_ParseTuple(arg, "ll", &atime, &mtime)) { > [ ... ] > > Probably no biggie here, but I'd like to point out that there is a significant > difference between the two calls: the former will allow any sequence for 'arg', > but the latter insists on a tuple. For that reason I always use PyArg_Parse() to > parse coordinate pairs and the like: it greatly enhanced the usability in those > cases. Examples of this usage can be found in the Mac subtree. I'll back out this change. But this raises the question should PyArg_Parse() be deprecated or should just METH_OLDARGS be deprecated? Neal From mwh@python.net Tue Jul 30 13:31:53 2002 From: mwh@python.net (Michael Hudson) Date: 30 Jul 2002 13:31:53 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: barry@python.org's message of "Tue, 30 Jul 2002 08:14:05 -0400" References: <2mvg6xjytc.fsf@starship.python.net> <15686.33549.262832.740505@anthem.wooz.org> Message-ID: <2md6t5ieh2.fsf@starship.python.net> barry@python.org (Barry A. Warsaw) writes: > >>>>> "MH" == Michael Hudson writes: > > MH> 3) The patch changes behaviour -- for the better! You're now > MH> rather less likely to get the trace function called several > MH> times per line. > > Does this change affect debugging? Hmm, I hadn't actually dared to run pdb with my patch... have now, and it seems OK. There is a difference: The bytecode for, say, def f(): print 1 begins with two SET_LINENO's. One is for the line containing "def f():", one is for "print 1". My patch means the debugger doesn't stop on the "def f():" line -- unsurprisingly, given that no execution ever takes place on that line. It would be possible to force a call to the trace function on entry to the function. In fact, there's a commented out block for this in my patch. Another approach would presuambly be for pdb to stop on 'call' trace events as well as 'line' ones. I don't really understand, or use all that often, pdb. Also, you currently stop twice on the first line of a for loop, but only once with my patch. There are probably other situations of excessive SET_LINENO emission. I know Skip (think it was him) killed a couple last week. Bug compatibility is possible here too, but I don't see the advantage. > Have you tested how this change might interact with e.g. hotshot? test_hotshot was very important to me as evidence I was making progress! It currently fails due to the not-calling-trace-on-def-line issue, but as I said, I think this is a *good* thing... Cheers, M. -- The ability to quote is a serviceable substitute for wit. -- W. Somerset Maugham From mal@lemburg.com Tue Jul 30 13:42:19 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 30 Jul 2002 14:42:19 +0200 Subject: [Python-Dev] seeing off SET_LINENO References: <2mvg6xjytc.fsf@starship.python.net> <15686.33549.262832.740505@anthem.wooz.org> <2md6t5ieh2.fsf@starship.python.net> Message-ID: <3D4689AB.2020107@lemburg.com> Michael Hudson wrote: > barry@python.org (Barry A. Warsaw) writes: > > >>>>>>>"MH" == Michael Hudson writes: >>>>>> >> MH> 3) The patch changes behaviour -- for the better! You're now >> MH> rather less likely to get the trace function called several >> MH> times per line. >> >>Does this change affect debugging? > > > Hmm, I hadn't actually dared to run pdb with my patch... have now, and > it seems OK. > > There is a difference: > > The bytecode for, say, > > def f(): > print 1 > > begins with two SET_LINENO's. One is for the line containing "def > f():", one is for "print 1". My patch means the debugger doesn't stop > on the "def f():" line -- unsurprisingly, given that no execution ever > takes place on that line. This might be used in debugging application to setup some environment *before* diving into the function itself. Note that many C debuggers stop at the declare line of a function as well (because they execute stack setup code), so a sudden change in this would probably confuse users of todays Python IDEs. > It would be possible to force a call to the trace function on entry to > the function. In fact, there's a commented out block for this in my > patch. Another approach would presuambly be for pdb to stop on 'call' > trace events as well as 'line' ones. I don't really understand, or > use all that often, pdb. > > Also, you currently stop twice on the first line of a for loop, but > only once with my patch. There are probably other situations of > excessive SET_LINENO emission. I know Skip (think it was him) killed > a couple last week. Bug compatibility is possible here too, but I > don't see the advantage. > > >>Have you tested how this change might interact with e.g. hotshot? > > > test_hotshot was very important to me as evidence I was making > progress! > > It currently fails due to the not-calling-trace-on-def-line issue, but > as I said, I think this is a *good* thing... Have you also tested this with the commonly used Python IDEs out there ? E.g. IDLE, IDLE-fork, PythonWorks, WingIDE, Emacs, BlackAdder, BOA Constructor, etc. etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From mwh@python.net Tue Jul 30 13:58:10 2002 From: mwh@python.net (Michael Hudson) Date: 30 Jul 2002 13:58:10 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: "M.-A. Lemburg"'s message of "Tue, 30 Jul 2002 14:42:19 +0200" References: <2mvg6xjytc.fsf@starship.python.net> <15686.33549.262832.740505@anthem.wooz.org> <2md6t5ieh2.fsf@starship.python.net> <3D4689AB.2020107@lemburg.com> Message-ID: <2mado9id99.fsf@starship.python.net> "M.-A. Lemburg" writes: > > begins with two SET_LINENO's. One is for the line containing "def > > f():", one is for "print 1". My patch means the debugger doesn't stop > > on the "def f():" line -- unsurprisingly, given that no execution ever > > takes place on that line. > > This might be used in debugging application to setup some > environment *before* diving into the function itself. So do that when you get the 'call' trace function call! That's what it's there for. > Note that many C debuggers stop at the declare line of > a function as well (because they execute stack setup code), > so a sudden change in this would probably confuse users of > todays Python IDEs. However, sudden changes here are *very* likely to confuse, I agree. Perhaps bug-compatibility is something to aim for. [...] > >>Have you tested how this change might interact with e.g. hotshot? > > > > > > test_hotshot was very important to me as evidence I was making > > progress! > > > > It currently fails due to the not-calling-trace-on-def-line issue, but > > as I said, I think this is a *good* thing... > > Have you also tested this with the commonly used Python IDEs > out there ? E.g. IDLE, IDLE-fork, PythonWorks, WingIDE, Emacs, > BlackAdder, BOA Constructor, etc. etc. No. Don't think it's relavent to IDLE (at least, I can't see any calls to settrace in there that aren't commented out). Python-mode's pdbtrack should just carry on working. Don't have easy access to the others. I'd be amazed if other IDE's were severely adversely affected. Anyway, isn't this what alphas are for? I have no problem emailing a relavent person for each of the above IDEs and pointing out that this change may affect them. Cheers, M. -- If a train station is a place where a train stops, what's a workstation? -- unknown (to me, at least) From barry@python.org Tue Jul 30 16:16:05 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 30 Jul 2002 11:16:05 -0400 Subject: [Python-Dev] seeing off SET_LINENO References: <2mvg6xjytc.fsf@starship.python.net> <15686.33549.262832.740505@anthem.wooz.org> <2md6t5ieh2.fsf@starship.python.net> Message-ID: <15686.44469.22988.913649@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> Hmm, I hadn't actually dared to run pdb with my patch... have MH> now, and it seems OK. Cool. MH> There is a difference: MH> The bytecode for, say, | def f(): | print 1 MH> begins with two SET_LINENO's. One is for the line containing MH> "def f():", one is for "print 1". My patch means the debugger MH> doesn't stop on the "def f():" line -- unsurprisingly, given MH> that no execution ever takes place on that line. MH> It would be possible to force a call to the trace function on MH> entry to the function. In fact, there's a commented out block MH> for this in my patch. Another approach would presuambly be MH> for pdb to stop on 'call' trace events as well as 'line' ones. MH> I don't really understand, or use all that often, pdb. I can't decide whether it would be good to stop on the def or not. Not doing so makes pdb act more like gdb, which also only stops on the first executable line, so maybe that's a good thing. MH> Also, you currently stop twice on the first line of a for MH> loop, but only once with my patch. That /is/ a good thing! >> Have you tested how this change might interact with >> e.g. hotshot? MH> test_hotshot was very important to me as evidence I was making MH> progress! :) MH> It currently fails due to the not-calling-trace-on-def-line MH> issue, but as I said, I think this is a *good* thing... So maybe we need two different behaviors depending on whether we're debugging or profiling. That might get a bit kludgy if we're using the same trace mechanism for both, but I'm sure it's tractable. -Barry From guido@python.org Tue Jul 30 16:26:23 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 11:26:23 -0400 Subject: [Python-Dev] PyArg_ParseTuple vs. PyArg_Parse In-Reply-To: Your message of "Tue, 30 Jul 2002 08:19:20 EDT." <3D468448.45C22891@metaslash.com> References: <3D468448.45C22891@metaslash.com> Message-ID: <200207301526.g6UFQNZ09835@odiug.zope.com> > I'll back out this change. But this raises the question should > PyArg_Parse() be deprecated or should just METH_OLDARGS be deprecated? Only METH_OLDARGS. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Tue Jul 30 16:27:46 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 30 Jul 2002 11:27:46 -0400 Subject: [Python-Dev] seeing off SET_LINENO References: <2mvg6xjytc.fsf@starship.python.net> <15686.33549.262832.740505@anthem.wooz.org> <2md6t5ieh2.fsf@starship.python.net> <3D4689AB.2020107@lemburg.com> <2mado9id99.fsf@starship.python.net> Message-ID: <15686.45170.12110.403625@anthem.wooz.org> >>>>> "MH" == Michael Hudson writes: MH> Python-mode's pdbtrack should just carry on working. Yup, because it is basically just looking for the pdb prompt, so it shouldn't care. -Barry From guido@python.org Tue Jul 30 16:32:24 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 11:32:24 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: Your message of "Mon, 29 Jul 2002 23:43:23 EDT." <00eb01c2377b$41dd4340$6300000a@holdenweb.com> References: <15685.35726.678832.241665@anthem.wooz.org> <00eb01c2377b$41dd4340$6300000a@holdenweb.com> Message-ID: <200207301532.g6UFWOt09871@odiug.zope.com> > > This makes me jump fifteen years behind (or so, I do not remember times), > > at the time of the great push so the Internet prefers: > > > > Random J. User > > > > It is more reasonable to always give the real name, optionally followed by > > an email, that to consider that the real name is a mere comment for the > > email address. Oh, I know some hackers who praise themselves as login > > names or dream having positronic brains :-), but most of us are humans > > before anything else! > > > > Could the PEP be reformulated, at least, for leaving the choice opened? Yes. The rule will be Name first, Email second. We won't convert all 200 existing PEPs to that format yet, but if someone with commit privileges wants to volunteer, be our guest. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@zope.com Tue Jul 30 16:36:13 2002 From: barry@zope.com (Barry A. Warsaw) Date: Tue, 30 Jul 2002 11:36:13 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines References: <15685.35726.678832.241665@anthem.wooz.org> Message-ID: <15686.45677.421287.717866@anthem.wooz.org> >>>>> "FP" =3D=3D Fran=E7ois Pinard writes: >> It has been a while since I posted a copy of PEP 1 to the >> mailing lists and newsgroups. FP> Thanks for giving me this opportunity. There is a tiny detail FP> that bothers me: >> The format of the author entry should be address@dom.ain >> (Random J. User) if the email address is included, and just >> Random J. User if the address is not given. FP> This makes me jump fifteen years behind (or so, I do not FP> remember times), at the time of the great push so the Internet FP> prefers: FP> Random J. User FP> It is more reasonable to always give the real name, optionally FP> followed by an email, that to consider that the real name is a FP> mere comment for the email address. This is a good point. Originally we thought it was more important to be able to contact the author, but there are quite a few reasons to revise this intention. As pointed out, email addresses change. Also, experience has shown that most of the discussions about PEPs are conducted on the public forums (mailing lists / newsgroups), so that's a fine way to contact the people working on the PEP. And of course, we allow the PEP authors to obfuscate or omit their email addresses altogether. FP> Could the PEP be reformulated, at least, for leaving the FP> choice opened? I'd rather have one preferred way of writing the header, so I'm going to change PEP 1 to mandate "Random J. User " with the email address optional. However, I'm going to let the old style remain for historical purposes since I don't think it's worth changing the existing PEPs. Thanks, -Barry From guido@python.org Tue Jul 30 16:37:36 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 11:37:36 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 13:42:38 +1200." <200207300142.g6U1gcSZ018273@kuku.cosc.canterbury.ac.nz> References: <200207300142.g6U1gcSZ018273@kuku.cosc.canterbury.ac.nz> Message-ID: <200207301537.g6UFbad09910@odiug.zope.com> > > I don't like where this is going. Let's not add locking to the buffer > > protocol. > > Do you still object to it even in the form I proposed in > my last message? (I.e. no separate "lock" call, locking > is implicit in the getxxxbuffer calls.) Yes, I still object. Having to make a call to release a resource with a function call is extremely error-prone, as we've seen with reference counting. There are too many cases where some early exit from a piece of code doesn't make the release call. > It does make the protocol slightly more complicated to > use (must remember to make a release call when you're > finished with the pointer) but it seems like a good > tradeoff to me for the flexibility gained. I'm not sure I see the use case. The main data types for which I expect this will be used would be strings and the new 'bytes' type, and both have fixed buffers that never move. > > probably nothing that could possibly invoke the Python interpreter > > recursively, since that might release the GIL. This would generally > > mean that calls to Py_DECREF() are unsafe while holding on to a buffer > > pointer! > > That could be fixed by incrementing the Python refcount as > long as a pointer is held. That could be done even without > the rest of my locking proposal. Of course, if you do that you > need a matching release call, so you might as well implement > the locking while you're at it. I think you misunderstand what I wrote. A py_DECREF() for an *unrelated* object can invoke Python code (if it ends up deleting a class instance with a __del__ method). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jul 30 16:39:30 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 11:39:30 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: Your message of "Mon, 29 Jul 2002 19:30:30 EDT." References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200207301539.g6UFdUS09930@odiug.zope.com> > > > I see no references to HAVE_CONFIG_H in the source code (except one > > > #undef in readline.c), yet we #define it on the command line. Is that > > > still necessary? > > > It's autoconf tradition to use that; it would replace DEFS to either > > many -D options, or -DHAVE_CONFIG_H (if AC_CONFIG_HEADER appears). > > > I don't think we need this, and it can safely be removed. > > The many `-D' options which appear when `AC_CONFIG_HEADER' is not used > are rather inelegant, they create a lot, really a lot of clumsiness in > `make' output. The idea, but you surely know it, was to regroup all > auto-configured definitions into a single header file, and limit the `-D' > to the sole `HAVE_CONFIG_H', or almost. While the: > > #if HAVE_CONFIG_H > # include > #endif > > idiom, for some widely used sources, was to cope with `AC_CONFIG_HEADER' > being defined in some projects, and not in others. There is no need to > include `config.h', nor to create it, if all `#define's have been already > done through a litany of `-D' options. Since we don't use this idiom, we can safely remove the -DHAVE_CONFIG_H (if we can find where it is set). --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Tue Jul 30 17:09:40 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 30 Jul 2002 18:09:40 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020730061016.32588.qmail@web40103.mail.yahoo.com> <005d01c237ae$4b2f6670$3da48490@neil> Message-ID: <025a01c237e3$82eb7c90$e000a8c0@thomasnotebook> [Scott] > > A single lock interface can be implemented over an object without any > > locking. Have the lockable object return simple "fixed buffer objects" > > with a limited lifespan. > [Neil] > This returns to the possibility of indeterminate lifespan as mentioned > earlier in the thread. > Can't you do something like this (maybe this is what Scott has in mind): static void _unlock(void *ptr, MyObject *self) { /* do whatever needed to unlock the object */ self->locked--; Py_DECREF(self); } static PyObject* MyObject_GetBuffer(MyObject *self) { /* Do whatever needed to lock the object */ self->lock++; Py_INCREF(self); return PyCObject_FromVoidPtrAndDesc(self->ptr, self, _unlock) } In plain text: Provide a method which returns a 'view' into your object's buffer after locking the object. The view holds a reference to object, the objects is unlocked and decref'd when the view is destroyed. In practice something better than a PyCObject will be used, and this one can even implement the 'fixed buffer' interface. Thomas From guido@python.org Tue Jul 30 17:22:11 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 12:22:11 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: Your message of "Tue, 30 Jul 2002 11:39:30 EDT." <200207301539.g6UFdUS09930@odiug.zope.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> Message-ID: <200207301622.g6UGMBl17143@odiug.zope.com> > Since we don't use this idiom, we can safely remove the > -DHAVE_CONFIG_H (if we can find where it is set). I looked. It's generated by AC_OUTPUT. I don't think I can get rid of it. So never mind. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jul 30 17:39:00 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 12:39:00 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 09:37:18 +1000." <029801c23758$e13594b0$3da48490@neil> References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> Message-ID: <200207301639.g6UGd1S17363@odiug.zope.com> > > ..., but I understand Neil's requirements. > > > > Can they be fulfilled by adding some kind of UnlockObject() > > call to the 'safe buffer interface', which should mean 'I won't > > use the pointer received by getsaferead/writebufferproc any more'? > > Yes, that is exactly what I want. I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)? --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Tue Jul 30 17:39:27 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 30 Jul 2002 12:39:27 -0400 Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) In-Reply-To: References: Message-ID: <20020730163927.GA63620@hishome.net> On Tue, Jul 30, 2002 at 12:24:05PM +0200, Matthias Urlichs wrote: > def some_iter(foo): > prepare(foo) > > try: > for i in foo: > yield something(i) > finally: > cleanup(foo) > > painlessly transmutes to this: > > class some_iter(object): > def __init__(foo): > prepare(foo) > > self.foo = foo > self.it = foo.__iter__() > > def next(self): > i = self.it.next() > return something(i) > > def __del__(self): > cleanup(self.foo) Bad example. Generators are useful precisely because some types of code are quite painful to change to this form. Anyway, it appears that generators can create reference loops if someone was peverted enough to keep a reference to the generator inside the generator. It doesn't seem to be worth the effort of making generators into GC objects just for this. Oren From pinard@iro.umontreal.ca Tue Jul 30 17:44:06 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 30 Jul 2002 12:44:06 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: <200207301539.g6UFdUS09930@odiug.zope.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> Message-ID: [Guido van Rossum] > Since we don't use this idiom, we can safely remove the > -DHAVE_CONFIG_H (if we can find where it is set). I guess you will have to override some `m4' macro within `configure.in', or related machinery. If things did not change too much, this probably means diving into `acgeneral.m4', to find out how and where this is best done. -- François Pinard http://www.iro.umontreal.ca/~pinard From nas@python.ca Tue Jul 30 17:56:58 2002 From: nas@python.ca (Neil Schemenauer) Date: Tue, 30 Jul 2002 09:56:58 -0700 Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) In-Reply-To: <20020730163927.GA63620@hishome.net>; from oren-py-d@hishome.net on Tue, Jul 30, 2002 at 12:39:27PM -0400 References: <20020730163927.GA63620@hishome.net> Message-ID: <20020730095658.A3196@glacier.arctrix.com> Oren Tirosh wrote: > It doesn't seem to be worth the effort of making generators > into GC objects just for this. What do you mean. They are already GC objects. Neil From thomas.heller@ion-tof.com Tue Jul 30 17:51:41 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 30 Jul 2002 18:51:41 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> Message-ID: <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> From: "Guido van Rossum" > > > ..., but I understand Neil's requirements. > > > > > > Can they be fulfilled by adding some kind of UnlockObject() > > > call to the 'safe buffer interface', which should mean 'I won't > > > use the pointer received by getsaferead/writebufferproc any more'? > > > > Yes, that is exactly what I want. > > I guess I still don't understand Neil's requirements. What can't be > done with the existing buffer interface (which requires you to hold > the GIL while using the pointer)? Processing in Python :-(. Thoms From pinard@iro.umontreal.ca Tue Jul 30 17:53:38 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 30 Jul 2002 12:53:38 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: <200207301622.g6UGMBl17143@odiug.zope.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: [Guido van Rossum] > > Since we don't use this idiom, we can safely remove the > > -DHAVE_CONFIG_H (if we can find where it is set). > I looked. It's generated by AC_OUTPUT. I don't think I can get rid > of it. So never mind. :-) Maybe AC_OUTPUT, or macros called by AC_OUTPUT, can be overridden. If this is not easy to do, you might want to discuss the matter with Akim, Cc:ed. Maybe he could tear down AC_OUTPUT in parts so the overriding gets easier? I know my friend Akim as good, helping and nice fellow! Don't fear him! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From thomas.heller@ion-tof.com Tue Jul 30 18:37:19 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 30 Jul 2002 19:37:19 +0200 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface Message-ID: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> Here is PEP 298 - the Fixed Buffer Interface, posted to get feedback from the Python community. Enjoy! Thomas PS: I'll going to a 2 weeks vacation at the end of this week, so don't hold your breath on replies from me if you post after, let's say, thursday. ----- PEP: 298 Title: The Fixed Buffer Interface Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/07/30 16:52:53 $ Author: Thomas Heller Status: Draft Type: Standards Track Created: 26-Jul-2002 Python-Version: 2.3 Post-History: Abstract This PEP proposes an extension to the buffer interface called the 'fixed buffer interface'. The fixed buffer interface fixes the flaws of the 'old' buffer interface as defined in Python versions up to and including 2.2, see [1]: The lifetime of the retrieved pointer is clearly defined. The buffer size is returned as a 'size_t' data type, which allows access to large buffers on platforms where sizeof(int) != sizeof(void *). Specification The fixed buffer interface exposes new functions which return the size and the pointer to the internal memory block of any python object which chooses to implement this interface. The size and pointer returned must be valid as long as the object is alive (has a positive reference count). So, only objects which never reallocate or resize the memory block are allowed to implement this interface. The fixed buffer interface omits the memory segment model which is present in the old buffer interface - only a single memory block can be exposed. Implementation Define a new flag in Include/object.h: /* PyBufferProcs contains bf_getfixedreadbuffer and bf_getfixedwritebuffer */ #define Py_TPFLAGS_HAVE_GETFIXEDBUFFER (1L<<15) This flag would be included in Py_TPFLAGS_DEFAULT: #define Py_TPFLAGS_DEFAULT ( \ .... Py_TPFLAGS_HAVE_GETFIXEDBUFFER | \ .... 0) Extend the PyBufferProcs structure by new fields in Include/object.h: typedef size_t (*getfixedreadbufferproc)(PyObject *, void **); typedef size_t (*getfixedwritebufferproc)(PyObject *, void **); typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* fixed buffer interface functions */ getfixedreadbufferproc bf_getfixedreadbufferproc; getfixedwritebufferproc bf_getfixedwritebufferproc; } PyBufferProcs; The new fields are present if the Py_TPFLAGS_HAVE_GETFIXEDBUFFER flag is set in the object's type. The Py_TPFLAGS_HAVE_GETFIXEDBUFFER flag implies the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. The getfixedreadbufferproc and getfixedwritebufferproc functions return the size in bytes of the memory block on success, and fill in the passed void * pointer on success. If these functions fail - either because an error occurs or no memory block is exposed - they must set the void * pointer to NULL and raise an exception. The return value is undefined in these cases and should not be used. Usually the getfixedwritebufferproc and getfixedreadbufferproc functions aren't called directly, they are called through convenience functions declared in Include/abstract.h: int PyObject_AsFixedReadBuffer(PyObject *obj, void **buffer, size_t *buffer_len); int PyObject_AsFixedWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len); These functions return 0 on success, set buffer to the memory location and buffer_len to the length of the memory block in bytes. On failure, or if the fixed buffer interface is not implemented by obj, they return -1 and set an exception. Backward Compatibility The size of the PyBufferProcs structure changes if this proposal is implemented, but the type's tp_flags slot can be used to determine if the additional fields are present. Reference Implementation Will be uploaded to the SourceForge patch manager by the author. Additional Notes/Comments Python strings, Unicode strings, mmap objects, and maybe other types would expose the fixed buffer interface, but the array type would *not*, because its memory block may be reallocated during its lifetime. Community Feedback Greg Ewing doubts the fixed buffer interface is needed at all, he thinks the normal buffer interface could be used if the pointer is (re)fetched each time it's used. This seems to be dangerous, because even innocent looking calls to the Python API like Py_DECREF() may trigger execution of arbitrary Python code. Neil Hodgson wants to expose pointers to memory blocks with limited lifetime: do some kind of lock operation on the object, retrieve the pointer, use it, and unlock the object again. While the author sees the need for this, it cannot be addressed by this proposal. Beeing required to call a function after not using the pointer received by the getfixedbufferprocs any more seems too error prone. Credits Scott Gilbert came up with the name 'fixed buffer interface'. References [1] The buffer interface http://mail.python.org/pipermail/python-dev/2000-October/009974.html [2] The Buffer Problem http://www.python.org/peps/pep-0296.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From martin@v.loewis.de Tue Jul 30 18:55:59 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 30 Jul 2002 19:55:59 +0200 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: <200207301622.g6UGMBl17143@odiug.zope.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: Guido van Rossum writes: > I looked. It's generated by AC_OUTPUT. I don't think I can get rid > of it. So never mind. :-) Just remove the @DEFS@ from Makefile.pre.in. Regards, Martin From oren-py-d@hishome.net Tue Jul 30 19:13:08 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 30 Jul 2002 21:13:08 +0300 Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) In-Reply-To: <20020730095658.A3196@glacier.arctrix.com>; from nas-dated-1028480220.f9673d@python.ca on Tue, Jul 30, 2002 at 09:56:58AM -0700 References: <20020730163927.GA63620@hishome.net> <20020730095658.A3196@glacier.arctrix.com> Message-ID: <20020730211308.A27690@hishome.net> On Tue, Jul 30, 2002 at 09:56:58AM -0700, Neil Schemenauer wrote: > Oren Tirosh wrote: > > It doesn't seem to be worth the effort of making generators > > into GC objects just for this. > > What do you mean. They are already GC objects. Ooops. Oren From guido@python.org Tue Jul 30 19:57:00 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 14:57:00 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: Your message of "Tue, 30 Jul 2002 12:44:06 EDT." References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> Message-ID: <200207301857.g6UIv0G17893@odiug.zope.com> > > Since we don't use this idiom, we can safely remove the > > -DHAVE_CONFIG_H (if we can find where it is set). > > I guess you will have to override some `m4' macro within `configure.in', or > related machinery. If things did not change too much, this probably means > diving into `acgeneral.m4', to find out how and where this is best done. I haven't the guts. Would you mind sending a patch? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jul 30 19:59:06 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 14:59:06 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 18:51:41 +0200." <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> Message-ID: <200207301859.g6UIx6117906@odiug.zope.com> > From: "Guido van Rossum" > > > > ..., but I understand Neil's requirements. > > > > > > > > Can they be fulfilled by adding some kind of UnlockObject() > > > > call to the 'safe buffer interface', which should mean 'I won't > > > > use the pointer received by getsaferead/writebufferproc any more'? > > > > > > Yes, that is exactly what I want. > > > > I guess I still don't understand Neil's requirements. What can't be > > done with the existing buffer interface (which requires you to hold > > the GIL while using the pointer)? > > Processing in Python :-(. Can you work out an example? I don't understand what you can do in Python, apart from passing it to something else that takes the buffer API or converting the data to a string or a bytes buffer. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jul 30 20:06:47 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 15:06:47 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: Your message of "Tue, 30 Jul 2002 14:57:00 EDT." <200207301857.g6UIv0G17893@odiug.zope.com> References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301857.g6UIv0G17893@odiug.zope.com> Message-ID: <200207301906.g6UJ6l619069@odiug.zope.com> > > > Since we don't use this idiom, we can safely remove the > > > -DHAVE_CONFIG_H (if we can find where it is set). > > > > I guess you will have to override some `m4' macro within `configure.in', or > > related machinery. If things did not change too much, this probably means > > diving into `acgeneral.m4', to find out how and where this is best done. > > I haven't the guts. Would you mind sending a patch? Never mind. Getting rid of DEFS from Makefile.pre.in did the trick. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Tue Jul 30 20:22:53 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Tue, 30 Jul 2002 21:22:53 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> <200207301859.g6UIx6117906@odiug.zope.com> Message-ID: <063301c237fe$80506b10$e000a8c0@thomasnotebook> [Guido] > > > I guess I still don't understand Neil's requirements. What can't be > > > done with the existing buffer interface (which requires you to hold > > > the GIL while using the pointer)? > > > > Processing in Python :-(. > > Can you work out an example? Not sure, maybe Neil could do it better. However, you yourself pointed out to Greg that it may be unsafe to even call Py_DECREF() on an unrelated object. > I don't understand what you can do in > Python, apart from passing it to something else that takes the buffer > API or converting the data to a string or a bytes buffer. Or pack it into a buffer *object* and hand it to arbitrary Python code. That's what we have now. What does 'hold the GIL' mean in this context? No other thread can execute: we have complete control over what we do. But what are we *allowed* to do? Thomas From guido@python.org Tue Jul 30 20:37:37 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 15:37:37 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 21:22:53 +0200." <063301c237fe$80506b10$e000a8c0@thomasnotebook> References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> <200207301859.g6UIx6117906@odiug.zope.com> <063301c237fe$80506b10$e000a8c0@thomasnotebook> Message-ID: <200207301937.g6UJbb220763@odiug.zope.com> > > > > I guess I still don't understand Neil's requirements. What can't be > > > > done with the existing buffer interface (which requires you to hold > > > > the GIL while using the pointer)? > > > > > > Processing in Python :-(. > > > > Can you work out an example? > Not sure, maybe Neil could do it better. > > However, you yourself pointed out to Greg that it may be unsafe > to even call Py_DECREF() on an unrelated object. The safe rule is that you should grab the pointer and then do some I/O on it and nothing else. > > I don't understand what you can do in > > Python, apart from passing it to something else that takes the buffer > > API or converting the data to a string or a bytes buffer. > > Or pack it into a buffer *object* and hand it to arbitrary > Python code. That's what we have now. Since the object you're packing already supports the buffer API, I don't see the point of packing it in a buffer object. > What does 'hold the GIL' mean in this context? > No other thread can execute: we have complete control > over what we do. But what are we *allowed* to do? When accessing a movable buffer, the safest rule is no Python API calls. There's a less restrictive safe rule, but it's messy because the end goal is "don't do anything that could conceivably end up in the Python interpreter main loop (ceval.c)" and there's no easy rule for that -- anything that uses Py_DECREF can end up doing that. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Jul 30 20:46:41 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 15:46:41 -0400 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 19:37:19 +0200." <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> References: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> Message-ID: <200207301946.g6UJkf520799@odiug.zope.com> > Here is PEP 298 - the Fixed Buffer Interface, posted to > get feedback from the Python community. > Enjoy! +1 from me (but you already knew that). > Thomas > > PS: I'll going to a 2 weeks vacation at the end of this week, > so don't hold your breath on replies from me if you post > after, let's say, thursday. > > ----- > PEP: 298 > Title: The Fixed Buffer Interface > Version: $Revision: 1.3 $ > Last-Modified: $Date: 2002/07/30 16:52:53 $ > Author: Thomas Heller > Status: Draft > Type: Standards Track > Created: 26-Jul-2002 > Python-Version: 2.3 > Post-History: > > > Abstract > > This PEP proposes an extension to the buffer interface called the > 'fixed buffer interface'. > > The fixed buffer interface fixes the flaws of the 'old' buffer > interface as defined in Python versions up to and including 2.2, > see [1]: (I keep reading this backwards, thinking that the following two items list the flaws in [1]. :-) > The lifetime of the retrieved pointer is clearly defined. > > The buffer size is returned as a 'size_t' data type, which > allows access to large buffers on platforms where sizeof(int) > != sizeof(void *). This second sounds like a change we could also make to the "old" buffer interface, if we introduce another flag bit that's *not* part of the default flags. > Specification > > The fixed buffer interface exposes new functions which return the > size and the pointer to the internal memory block of any python > object which chooses to implement this interface. > > The size and pointer returned must be valid as long as the object > is alive (has a positive reference count). So, only objects which > never reallocate or resize the memory block are allowed to > implement this interface. > > The fixed buffer interface omits the memory segment model which is > present in the old buffer interface - only a single memory block > can be exposed. > > > Implementation > > Define a new flag in Include/object.h: > > /* PyBufferProcs contains bf_getfixedreadbuffer > and bf_getfixedwritebuffer */ > #define Py_TPFLAGS_HAVE_GETFIXEDBUFFER (1L<<15) > > > This flag would be included in Py_TPFLAGS_DEFAULT: > > #define Py_TPFLAGS_DEFAULT ( \ > .... > Py_TPFLAGS_HAVE_GETFIXEDBUFFER | \ > .... > 0) > > > Extend the PyBufferProcs structure by new fields in > Include/object.h: > > typedef size_t (*getfixedreadbufferproc)(PyObject *, void **); > typedef size_t (*getfixedwritebufferproc)(PyObject *, void **); > > typedef struct { > getreadbufferproc bf_getreadbuffer; > getwritebufferproc bf_getwritebuffer; > getsegcountproc bf_getsegcount; > getcharbufferproc bf_getcharbuffer; > /* fixed buffer interface functions */ > getfixedreadbufferproc bf_getfixedreadbufferproc; > getfixedwritebufferproc bf_getfixedwritebufferproc; > } PyBufferProcs; > > > The new fields are present if the Py_TPFLAGS_HAVE_GETFIXEDBUFFER > flag is set in the object's type. > > The Py_TPFLAGS_HAVE_GETFIXEDBUFFER flag implies the > Py_TPFLAGS_HAVE_GETCHARBUFFER flag. > > The getfixedreadbufferproc and getfixedwritebufferproc functions > return the size in bytes of the memory block on success, and fill > in the passed void * pointer on success. If these functions fail > - either because an error occurs or no memory block is exposed - > they must set the void * pointer to NULL and raise an exception. > The return value is undefined in these cases and should not be > used. > > Usually the getfixedwritebufferproc and getfixedreadbufferproc > functions aren't called directly, they are called through > convenience functions declared in Include/abstract.h: > > int PyObject_AsFixedReadBuffer(PyObject *obj, > void **buffer, > size_t *buffer_len); > > int PyObject_AsFixedWriteBuffer(PyObject *obj, > void **buffer, > size_t *buffer_len); > > These functions return 0 on success, set buffer to the memory > location and buffer_len to the length of the memory block in > bytes. On failure, or if the fixed buffer interface is not > implemented by obj, they return -1 and set an exception. > > > Backward Compatibility > > The size of the PyBufferProcs structure changes if this proposal > is implemented, but the type's tp_flags slot can be used to > determine if the additional fields are present. > > > Reference Implementation > > Will be uploaded to the SourceForge patch manager by the author. I'm holding my breath now... > > Additional Notes/Comments > > Python strings, Unicode strings, mmap objects, and maybe other > types would expose the fixed buffer interface, but the array type > would *not*, because its memory block may be reallocated during > its lifetime. > > > Community Feedback > > Greg Ewing doubts the fixed buffer interface is needed at all, he > thinks the normal buffer interface could be used if the pointer is > (re)fetched each time it's used. This seems to be dangerous, > because even innocent looking calls to the Python API like > Py_DECREF() may trigger execution of arbitrary Python code. > > Neil Hodgson wants to expose pointers to memory blocks with > limited lifetime: do some kind of lock operation on the object, > retrieve the pointer, use it, and unlock the object again. While > the author sees the need for this, it cannot be addressed by this > proposal. Beeing required to call a function after not using the x > pointer received by the getfixedbufferprocs any more seems too > error prone. > > > Credits > > Scott Gilbert came up with the name 'fixed buffer interface'. > > > References > > [1] The buffer interface > http://mail.python.org/pipermail/python-dev/2000-October/009974.html > > [2] The Buffer Problem > http://www.python.org/peps/pep-0296.html > > > Copyright > > This document has been placed in the public domain. > > > > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > End: --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Tue Jul 30 21:15:11 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 30 Jul 2002 23:15:11 +0300 Subject: [Python-Dev] Valgrinding Python Message-ID: <20020730231511.A28762@hishome.net> I ran some tests with Julian Seward's amazing Valgrind memory debugger. Python is remarkably clean. Much cleaner than any other program of non-trivial size that I tested. Objects/obmalloc.c: The ADDRESS_IN_RANGE macro makes references to uninitialized memory. This produced tons of warnings so I ran the rest of the tests without pymalloc. The following tests produced invalid accesses inside the external library: test_anydbm.py test_bsddb.py test_dbm.py test_gdbm.py test_curses.py test_pwd.py test_socket_ssl.py I also got some invalid accesses in Modules/arraymodule.c:array_ass_subscr while running test_array and in Objects/Listobject.c:list_ass_subscript running test_types. For some reason I couldn't reproduce them later. Oren From jacobs@penguin.theopalgroup.com Tue Jul 30 21:21:36 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Tue, 30 Jul 2002 16:21:36 -0400 (EDT) Subject: [Python-Dev] Valgrinding Python In-Reply-To: <20020730231511.A28762@hishome.net> Message-ID: On Tue, 30 Jul 2002, Oren Tirosh wrote: > I ran some tests with Julian Seward's amazing Valgrind memory debugger. > Python is remarkably clean. Much cleaner than any other program of > non-trivial size that I tested. I've been using Python with valgrind too, and with great success. I've caught several non-trivial problems in some of our extension modules, though only a few very picky things in the Python core. Valgrind has options to attached gdb to running processes when problems occur. Combining this with gdb patched to produce mixed C/Python tracebacks, and you get an awesome memory debugger. -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From nhodgson@bigpond.net.au Tue Jul 30 21:55:39 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Wed, 31 Jul 2002 06:55:39 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> <200207301859.g6UIx6117906@odiug.zope.com> <063301c237fe$80506b10$e000a8c0@thomasnotebook> Message-ID: <004e01c2380b$762ef5e0$3da48490@neil> Thomas Heller (Guido, Thomas, Guido): > [Guido] > > > > I guess I still don't understand Neil's requirements. What can't be > > > > done with the existing buffer interface (which requires you to hold > > > > the GIL while using the pointer)? > > > > > > Processing in Python :-(. > > > > Can you work out an example? > Not sure, maybe Neil could do it better. I see this interface as a bridge between objects offering generic buffer oriented facilities (asynch or low level I/O for example) and objects that want to make it possible to use these facilities on their data (text buffers, multimedia buffers, numeric arrays) by yielding a pointer to their otherwise internal data. The bridging code between the two objects is unrestricted Python code that may cause memory to be moved around. Neil From guido@python.org Tue Jul 30 22:13:00 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 30 Jul 2002 17:13:00 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Wed, 31 Jul 2002 06:55:39 +1000." <004e01c2380b$762ef5e0$3da48490@neil> References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> <200207301859.g6UIx6117906@odiug.zope.com> <063301c237fe$80506b10$e000a8c0@thomasnotebook> <004e01c2380b$762ef5e0$3da48490@neil> Message-ID: <200207302113.g6ULD0N21213@odiug.zope.com> > I see this interface as a bridge between objects offering generic buffer > oriented facilities (asynch or low level I/O for example) and objects that > want to make it possible to use these facilities on their data (text > buffers, multimedia buffers, numeric arrays) by yielding a pointer to their > otherwise internal data. > > The bridging code between the two objects is unrestricted Python code > that may cause memory to be moved around. If the buffer is relatively small, copying the data an extra time shouldn't be a problem, and you can use the old API. If the buffer is huge, you probably shouldn't want to move the buffer around in memory anyway, So I don't think your case for needing a lockable interface is very strong. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Tue Jul 30 22:56:35 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 30 Jul 2002 17:56:35 -0400 Subject: [Python-Dev] Generator cleanup idea (patch: try/finally in generators) In-Reply-To: <200207300112.g6U1CJoO018210@kuku.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > I don't think you'd really be breaking any promises. > After all, if someone wrote > > def asdf(): > try: > something_that_never_returns() > finally: > ... > > they wouldn't have much ground for complaint that the > finally never got executed. The case we're talking about > seems much the same situation. Not to me -- you can't write something_that_never_returns() in Python unless the program runs forever, you crash the system, you get the thread stuck in deadlock or permanent starvation, or you're anti-social by calling os._exit() (sys.exit() is fine: it raises SystemExit, and pending finally blocks get run then). All of those are highly exceptional use cases; everyone else is guaranteed their finally block will eventually run. > I take it you usually provide a method for explicit cleanup. Yup. > How about giving generator-iterators one, then, called > maybe close() or abort(). The effect would be to raise > an appropriate exception at the point of the yield, > triggering any except or finally blocks. As before, I'm already happy; sharing state via instance variables is all "the solution" I've felt a need for. If consensus is that something needs to be done here anyway, I'd rather think of generators more as threads of control than as lumps of data with attributes. From that view, I think it would be easier to make a coherent case that generators should support a termination protocol involving raising SystemExit. But then that should apply to all thread-like objects too, and there's no way now for one thread to raise SystemExit in another (but it's arguable that there should be). > This method could even be added to the general iterator > protocol (implementing it would be optional). It would > then provide a standard name for people to use for > cleanup methods in their own iterator classes. Generalizing from zero examples ? From tim.one@comcast.net Tue Jul 30 23:53:04 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 30 Jul 2002 18:53:04 -0400 Subject: [Python-Dev] Valgrinding Python In-Reply-To: <20020730231511.A28762@hishome.net> Message-ID: [Oren Tirosh] > I ran some tests with Julian Seward's amazing Valgrind memory debugger. > Python is remarkably clean. Much cleaner than any other program of > non-trivial size that I tested. It's been thru Purify and Insure++, off and on, several times, and we enjoyed many wasted hours squashing suprious complaints from those . > Objects/obmalloc.c: > > The ADDRESS_IN_RANGE macro makes references to uninitialized memory. > > This produced tons of warnings so I ran the rest of the tests without > pymalloc. Ouch. That's not going to change, so it may be worth learning how to write a Valgrind suppression file. ADDRESS_IN_RANGE determines whether an address was passed out by pymalloc. It does this by (a) reading an index from an address computed *from* the claimant address; then (b) using that to index into its own data structures, which record the range of addresses pymalloc controls; then (c) comparing the claimant address to that range. Part #a can easily end up reading uninitialized memory. but pymalloc doesn't care (a junk value found there can't fool it). This is needed to determine whether to hand off an address to the platform free() or realloc(), and in such cases part #a may well read up any kind of trash. > The following tests produced invalid accesses inside the external > library: > > test_anydbm.py > test_bsddb.py > test_dbm.py > test_gdbm.py > test_curses.py > test_pwd.py > test_socket_ssl.py Figures . > I also got some invalid accesses in > Modules/arraymodule.c:array_ass_subscr > while running test_array and in Objects/Listobject.c:list_ass_subscript > running test_types. For some reason I couldn't reproduce them later. Another memory-debugging tool, another chance to debug a memory-debugging tool. From neal@metaslash.com Wed Jul 31 00:15:34 2002 From: neal@metaslash.com (Neal Norwitz) Date: Tue, 30 Jul 2002 19:15:34 -0400 Subject: [Python-Dev] Valgrinding Python References: Message-ID: <3D471E16.A01B14D8@metaslash.com> Tim Peters wrote: > > [Oren Tirosh] > > > I also got some invalid accesses in > > Modules/arraymodule.c:array_ass_subscr > > while running test_array and in Objects/Listobject.c:list_ass_subscript > > running test_types. For some reason I couldn't reproduce them later. > > Another memory-debugging tool, another chance to debug a memory-debugging > tool. Naw, cvs update can explain this one. :-) Michael Hudson fixed this (extended slice problem) based on a bug report I submitted. I ran valgrind on RedHat 7.2. I also had problems w/pymalloc originally so I disabled it. I may try again. There's somthing I found very interesting, though. I run purify on a sparc w/gcc 2.95.3 (maybe 3.0.x too, I can't remember). The problems with pymalloc and some of the dbm problems were also reported by purify. I've reviewed the code and can't find any problems. But different tools on different architectures with somewhat different compilers report similar errors. Neal From greg@cosc.canterbury.ac.nz Wed Jul 31 00:34:29 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 11:34:29 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <20020730061026.33569.qmail@web40106.mail.yahoo.com> Message-ID: <200207302334.g6UNYTZ7018964@kuku.cosc.canterbury.ac.nz> Scott Gilbert : > We haven't seen a semi-thorough use case where the locking behavior is > beneficial yet. ... If there is no realizable benefit to the > acquire/release semantics of the new interface, then this is just extra > burden too. The proposer of the original safe-buffer interface claimed to have a use case where the existing buffer interface is not safe enough, involving asynchronous I/O. I've been basing my comments on the assumption that he does actually have a need for it. The original proposal was restricted to non-resizable objects. I suggested a small extension which would remove this restriction, at what seems to me quite a small cost. It may turn out that the restriction is easily lived with. On the other hand, we might decide later that it's a nuisance. What worries me is if we design a restricted safe-buffer interface now, and start using it, and later decide that we want an unrestricted safe-buffer interface, we'll then have two different safe-buffer interfaces around, with lots of code that will only accept non-resizable objects for no reason other than that it's using the old interface. So I think it's worth putting in some thought and getting it as right as we can from the beginning. > I'm concerned that this is very much like the segment count features > of the current PyBufferProcs. It was apparently designed for more > generality, and while no one uses it, everyone has to check that the > segment count is one or raise an exception. It's not as bad as that! My version of the proposal would impose *no* burden on implementations that did not require locking, for the following reasons: 1) Locking is an optional task performed by the getxxxbuffer routines. Objects which do not require locking just don't do it. 2) For objects not requiring locking, the releasebuffer operation is a no-op. Such an object can simply not implement this routine, and the type machinery can fill it in with a stub. It does place one extra burden on users of the interface, namely calling the release routine. But I believe that this could even be beneficial, in a way. The user is going to have to think about the lifetime of the pointer, and be sure to keep a reference to the underlying Python object as long as the pointer is needed. Having to keep it around so that you can call the release routine on it would help to bring this into sharp focus. > The extension releases the GIL so that another > thread can work on the array object. Hey, whoa right there! If you have two threads accessing this array object simulaneously, you should be using a mutex or semaphore or something to coordinate them. As I pointed out before, thread synchronisation is outside the scope of my proposal. The only purpose of the locking, in my proposal, is to ensure that an exception occurs instead of a crash if the programmer screws up and tries to resize an object whose internals are being messed with. It's up to the programmer to do whatever is necessary to ensure that he doesn't do that. > If extend() is called while thread 1 has the array locked, it can: > > A) raise an exception or return an error Yes. (Raise an exception.) > Case A is troublesome because depending on thread scheduling/disk > performance, you will or won't get the exception. As I said before, you should be synchronising your threads somehow *before* they operate on the object! If you don't, you deserve whatever you get. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jul 31 01:03:55 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 12:03:55 +1200 (NZST) Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <2md6t5ieh2.fsf@starship.python.net> Message-ID: <200207310003.g6V03tjm018993@kuku.cosc.canterbury.ac.nz> Michael Hudson : > My patch means the debugger doesn't stop > on the "def f():" line -- unsurprisingly, given that no execution ever > takes place on that line. If there is no code there, there shouldn't be any need to stop there, should there? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Jul 31 01:12:55 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 12:12:55 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207301537.g6UFbad09910@odiug.zope.com> Message-ID: <200207310012.g6V0Ctj5019001@kuku.cosc.canterbury.ac.nz> > I think you misunderstand what I wrote. A py_DECREF() for an > *unrelated* object can invoke Python code (if it ends up deleting a > class instance with a __del__ method). I don't see why that's a problem. If the unrelated object's __del__ ends up messing with the object in question, that's an issue for the programmer to sort out. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Wed Jul 31 01:09:59 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 30 Jul 2002 20:09:59 -0400 Subject: [Python-Dev] Valgrinding Python In-Reply-To: <3D471E16.A01B14D8@metaslash.com> Message-ID: [Neal Norwitz] > ... > I also had problems w/pymalloc originally so I disabled it. > I may try again. There's somthing I found very interesting, though. > > I run purify on a sparc w/gcc 2.95.3 (maybe 3.0.x too, > I can't remember). The problems with pymalloc and some of the dbm > problems were also reported by purify. I've reviewed the code > and can't find any problems. But different tools on different > architectures with somewhat different compilers report similar errors. pymalloc does read uninitialized memory, and routinely, as explained in the msg you're replying to. If that occurs outside code generated for the ADDRESS_IN_RANGE macro, though, it may be a real problem (inside code generated by that macro, reading uninitialized memory is-- curiously enough! --necessary for proper operation). From greg@cosc.canterbury.ac.nz Wed Jul 31 01:14:56 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 12:14:56 +1200 (NZST) Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <15686.45677.421287.717866@anthem.wooz.org> Message-ID: <200207310014.g6V0EuUS019007@kuku.cosc.canterbury.ac.nz> > Originally we thought it was more important to > be able to contact the author, but there are quite a few reasons to > revise this intention. As pointed out, email addresses change. Also, > experience has shown that most of the discussions about PEPs are > conducted on the public forums (mailing lists / newsgroups), so that's > a fine way to contact the people working on the PEP. And of course, > we allow the PEP authors to obfuscate or omit their email addresses > altogether. Why not have *two* fields in the PEP, one for the real name, and the other for an email address? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From barry@python.org Wed Jul 31 01:49:06 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 30 Jul 2002 20:49:06 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines References: <15686.45677.421287.717866@anthem.wooz.org> <200207310014.g6V0EuUS019007@kuku.cosc.canterbury.ac.nz> Message-ID: <15687.13314.271722.779762@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> Why not have *two* fields in the PEP, one for the real GE> name, and the other for an email address? I dunno, that seems like overkill. -Barry From greg@cosc.canterbury.ac.nz Wed Jul 31 02:44:23 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 13:44:23 +1200 (NZST) Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <15687.13314.271722.779762@anthem.wooz.org> Message-ID: <200207310144.g6V1iNgZ019135@kuku.cosc.canterbury.ac.nz> Barry: > GE> Why not have *two* fields in the PEP, one for the real > GE> name, and the other for an email address? > > I dunno, that seems like overkill. It would certainly put an end to this argument, though! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From nhodgson@bigpond.net.au Wed Jul 31 03:15:28 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Wed, 31 Jul 2002 12:15:28 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020729002957.74716.qmail@web40101.mail.yahoo.com> <00c601c23707$35819a20$3da48490@neil> <06f301c2370d$16941060$e000a8c0@thomasnotebook> <029801c23758$e13594b0$3da48490@neil> <200207301639.g6UGd1S17363@odiug.zope.com> <03a101c237e9$60fb3a20$e000a8c0@thomasnotebook> <200207301859.g6UIx6117906@odiug.zope.com> <063301c237fe$80506b10$e000a8c0@thomasnotebook> <004e01c2380b$762ef5e0$3da48490@neil> <200207302113.g6ULD0N21213@odiug.zope.com> Message-ID: <039701c23838$26dbfab0$3da48490@neil> Guido van Rossum: > If the buffer is relatively small, copying the data an extra time > shouldn't be a problem, and you can use the old API. > > If the buffer is huge, you probably shouldn't want to move the buffer > around in memory anyway, Even large (or huge) buffers may need extension (inserting text in Scintilla, adding a frame to a movie), leading to a reallocation and thus a move. Neil From nhodgson@bigpond.net.au Wed Jul 31 03:01:25 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Wed, 31 Jul 2002 12:01:25 +1000 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface References: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> Message-ID: <039301c23838$24a21040$3da48490@neil> Thomas Heller: > Abstract > > This PEP proposes an extension to the buffer interface called the > 'fixed buffer interface'. I'd like to see the purpose of the interface defined here rather than rely upon a reference to an email which talks about two buffer entities, the API and the object. Reading the email produces a purpose that could be used here: [the Buffer API is] intended to allow efficient binary I/O from and (in some cases) to large objects that have a relatively well-understood underlying memory representation Neil From nhodgson@bigpond.net.au Wed Jul 31 03:12:31 2002 From: nhodgson@bigpond.net.au (Neil Hodgson) Date: Wed, 31 Jul 2002 12:12:31 +1000 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020730061016.32588.qmail@web40103.mail.yahoo.com> <005d01c237ae$4b2f6670$3da48490@neil> <025a01c237e3$82eb7c90$e000a8c0@thomasnotebook> Message-ID: <039401c23838$25ad8cd0$3da48490@neil> Thomas Heller: > In plain text: > Provide a method which returns a 'view' into your object's > buffer after locking the object. The view holds a reference > to object, the objects is unlocked and decref'd when the > view is destroyed. Yes, this handles the situation. However I see some problems here: 1 Explicit resource release, such as closing files, is easier to understand and debug than implicit ref-count exhaustion. 2 On platforms such as .NET and the JVM, the view object will live for an indeterminate time, prohibiting resizes until the VM decides to garbage collect. While the JVM can not return pointers, and so may seem to not be a candidate for this interface, it can return array references. 3 More complex implementation requiring a secondary view object. Neil From neal@metaslash.com Wed Jul 31 03:19:08 2002 From: neal@metaslash.com (Neal Norwitz) Date: Tue, 30 Jul 2002 22:19:08 -0400 Subject: [Python-Dev] Valgrinding Python References: Message-ID: <3D47491C.B0E9E165@metaslash.com> Tim Peters wrote: > pymalloc does read uninitialized memory, and routinely, as explained in the > msg you're replying to. If that occurs outside code generated for the > ADDRESS_IN_RANGE macro, though, it may be a real problem (inside code > generated by that macro, reading uninitialized memory is-- curiously > enough! --necessary for proper operation). This is good news. I changed ADDRESS_IN_RANGE to a function, then suppressed it. There were no other uninitialized memory reads. Valgrind does report a bunch of problems with pthreads, but these are likely valgrind's fault. There are some complaints about memory leaks, but these seem to appear only to occur when spawning/threading. The leaks are small and short lived. Neal From barry@python.org Wed Jul 31 03:22:12 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 30 Jul 2002 22:22:12 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines References: <15687.13314.271722.779762@anthem.wooz.org> <200207310144.g6V1iNgZ019135@kuku.cosc.canterbury.ac.nz> Message-ID: <15687.18900.871205.963521@anthem.wooz.org> >>>>> "GE" == Greg Ewing writes: GE> Barry: >> GE> Why not have *two* fields in the PEP, one for the real GE> >> name, and the other for an email address? >> I dunno, that seems like overkill. GE> It would certainly put an end to this argument, though! What argument? :) -Barry From aahz@pythoncraft.com Wed Jul 31 04:36:39 2002 From: aahz@pythoncraft.com (Aahz) Date: Tue, 30 Jul 2002 23:36:39 -0400 Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <15687.18900.871205.963521@anthem.wooz.org> References: <15687.13314.271722.779762@anthem.wooz.org> <200207310144.g6V1iNgZ019135@kuku.cosc.canterbury.ac.nz> <15687.18900.871205.963521@anthem.wooz.org> Message-ID: <20020731033639.GB14993@panix.com> On Tue, Jul 30, 2002, Barry A. Warsaw wrote: > > >>>>> "GE" == Greg Ewing writes: > > GE> Barry: > > >> GE> Why not have *two* fields in the PEP, one for the real GE> > >> name, and the other for an email address? > >> I dunno, that seems like overkill. > > GE> It would certainly put an end to this argument, though! > > What argument? :) You blithering idiot, you ought to be smacked with a fish. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From greg@cosc.canterbury.ac.nz Wed Jul 31 05:18:35 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Jul 2002 16:18:35 +1200 (NZST) Subject: [Python-Dev] Re: PEP 1, PEP Purpose and Guidelines In-Reply-To: <20020731033639.GB14993@panix.com> Message-ID: <200207310418.g6V4IZVf019187@kuku.cosc.canterbury.ac.nz> Aahz : > > GE> It would certainly put an end to this argument, though! > > > > What argument? :) > > You blithering idiot, you ought to be smacked with a fish. No, that's abuse. Arguments are next door... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mhammond@skippinet.com.au Wed Jul 31 06:28:44 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 31 Jul 2002 15:28:44 +1000 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <200207310003.g6V03tjm018993@kuku.cosc.canterbury.ac.nz> Message-ID: > Michael Hudson : > > > My patch means the debugger doesn't stop > > on the "def f():" line -- unsurprisingly, given that no execution ever > > takes place on that line. [Greg] > If there is no code there, there shouldn't be any > need to stop there, should there? [Barry in a different message] > I can't decide whether it would be good to stop on the def or not. > Not doing so makes pdb act more like gdb, which also only stops on the > first executable line, so maybe that's a good thing. IMO, the Python debugger "interface" should include function entry. The debugger UI (in this case pdb, but any other debugger) may choose not to break there, but the debugger itself may be able to implement some useful things by having the hook. Mark. From xscottg@yahoo.com Wed Jul 31 07:29:50 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 30 Jul 2002 23:29:50 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <005d01c237ae$4b2f6670$3da48490@neil> Message-ID: <20020731062950.59376.qmail@web40105.mail.yahoo.com> --- Neil Hodgson wrote: > > Since Scintilla is a component within a user interface, it shares this > responsibility with the container application with the application being > the main determinant. If I was writing a Windows-specific application that > used Scintilla, and I wanted to use Asynchronous I/O then my preferred > technique would be to change the message processing loop to leave the UI > input messages in the queue until the I/O had completed. > Once the I/O had completed then the message loop would change back to > processing all messages which would allow the banked up input to come > through. > Cool. This is what I was looking for. It's a tad complicated, but it makes a bit of sense. Is there anything in here that can't be done if you only had the simple (no locking) version of the fixed buffer interface? > > > A single lock interface can be implemented over an object without any > > locking. Have the lockable object return simple "fixed buffer objects" > > with a limited lifespan. > > This returns to the possibility of indeterminate lifespan as mentioned > earlier in the thread. > Not if you add an explicit release() method. Just like the file object has an explicit close() method. Your object with the locking smarts could just return "snapshot" views with an explicit release() method on them. > > > At which point I wonder what using asynchronous I/O achieved since the > > resize operation had to wait synchronously for the I/O to complete. > > This also sounds suspiciously like blocking the resize thread, but I > > won't argue that point. > > There may be other tasks that the application can perform while > waiting for the I/O to complete, such as displaying, styling or line- > wrapping whatever text has already arrived (assuming that there are some > facilities for discovering this) or performing similar tasks for other > windows. > All good points. Thank you for indulging me. Sorry to be such a PITA. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 31 07:29:59 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 30 Jul 2002 23:29:59 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <025a01c237e3$82eb7c90$e000a8c0@thomasnotebook> Message-ID: <20020731062959.59382.qmail@web40105.mail.yahoo.com> --- Thomas Heller wrote: > > In plain text: > Provide a method which returns a 'view' into your object's > buffer after locking the object. The view holds a reference > to object, the objects is unlocked and decref'd when the > view is destroyed. > Exactly. This is just like putting an explicit close() on the file object. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 31 07:30:58 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 30 Jul 2002 23:30:58 -0700 (PDT) Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface In-Reply-To: <039301c23838$24a21040$3da48490@neil> Message-ID: <20020731063058.76595.qmail@web40103.mail.yahoo.com> --- Neil Hodgson wrote: > > I'd like to see the purpose of the interface defined here rather than > rely upon a reference to an email which talks about two buffer entities, > the API and the object. Reading the email produces a purpose that could > be used here: > > [the Buffer API is] intended to allow efficient > binary I/O from and (in some cases) to large objects that have a > relatively well-understood underlying memory representation > It's not just for I/O. In addition to I/O, I intend to use it for numerical calculations that can be run independently of the GIL. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 31 07:30:55 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 30 Jul 2002 23:30:55 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <039401c23838$25ad8cd0$3da48490@neil> Message-ID: <20020731063055.74410.qmail@web40110.mail.yahoo.com> --- Neil Hodgson wrote: > Thomas Heller: > > > In plain text: > > Provide a method which returns a 'view' into your object's > > buffer after locking the object. The view holds a reference > > to object, the objects is unlocked and decref'd when the > > view is destroyed. > > Yes, this handles the situation. However I see some problems here: > 1 Explicit resource release, such as closing files, is easier to > understand and debug than implicit ref-count exhaustion. > So add an explicit release() method to your object. Just because it supports the "Fixed Buffer API" doesn't mean you can't add other methods to it. > > 2 On platforms such as .NET and the JVM, the view object will live for an > indeterminate time, prohibiting resizes until the VM decides to garbage > collect. While the JVM can not return pointers, and so may seem to not be > a candidate for this interface, it can return array references. > This is solved with the explicit release() method above. Just like files solve this problem with an explicit close() method. > > 3 More complex implementation requiring a secondary view object. > It's also a more complex problem that you're trying to solve. Putting the complexity on the common, simple, cases may not be appropriate when the complex cases are few and far between. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Wed Jul 31 07:31:13 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Tue, 30 Jul 2002 23:31:13 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207302334.g6UNYTZ7018964@kuku.cosc.canterbury.ac.nz> Message-ID: <20020731063113.74481.qmail@web40110.mail.yahoo.com> --- Greg Ewing wrote: > Scott Gilbert : > > > We haven't seen a semi-thorough use case where the locking behavior is > > beneficial yet. ... If there is no realizable benefit to the > > acquire/release semantics of the new interface, then this is just extra > > burden too. > > The proposer of the original safe-buffer interface claimed to have a > use case where the existing buffer interface is not safe enough, > involving asynchronous I/O. I've been basing my comments on the > assumption that he does actually have a need for it. > I believe Thomas Heller's needs were met without making locking part of the interface, but that he was willing to bend to please you and Neil. His original proposal did not include any notion of locking. Nor does his current since Guido has taken a stand on this issue. > > So I think it's worth putting in some thought and getting it as > right as we can from the beginning. > Absolutely. I just wanted to make sure that there is at least one sensible use case before adding the complexity. Moreover, if the sensible use cases for locking are few and far between, then I'm still inclined to leave it out since you can add the locking semantics at a different level. It looks like Neil has sufficiently defined an example where it's useful. His use case is a bit complicated though, and I think he could get every bit of that functionality by putting the locking in a smarter object tailored for his application, and working with temporary "snapshot" objects with an explicit release() method. What if Neil decides he needs Reader/Writer locks? This is completely justifiable too, since multiple threads can read an object without interfering, but only one should be writing it. We shouldn't arbitrarily add complexity for the exceptional cases. > > > I'm concerned that this is very much like the segment count features > > of the current PyBufferProcs. It was apparently designed for more > > generality, and while no one uses it, everyone has to check that the > > segment count is one or raise an exception. > > It's not as bad as that! My version of the proposal would impose *no* > burden on implementations that did not require locking, for the > following reasons: > Your use of the word *no* is different than mine. :-) I could similarly claim that the segment count puts no burden on implementations that don't need it. > > 1) Locking is an optional task performed by the getxxxbuffer > routines. Objects which do not require locking just don't > do it. > > 2) For objects not requiring locking, the releasebuffer > operation is a no-op. Such an object can simply not > implement this routine, and the type machinery can fill > it in with a stub. > I believe it will be a no-op in enough places that extension writers will do it wrong without even knowing. > > > The extension releases the GIL so that another > > thread can work on the array object. > > Hey, whoa right there! If you have two threads accessing this array > object simulaneously, you should be using a mutex or semaphore or > something to coordinate them. As I pointed out before, thread > synchronisation is outside the scope of my proposal. > This is exactly Neil's use case. He's got two threads reading it simultaneously. One thread (not really a thread, but the asynchronous I/O operation) is writing to disk, and the other thread is keeping the user interface updated. There is no problem until the user tries to enter text (which forces a resize) before the asynchronous I/O is complete. Neil has a solution for this, but I think it's less than typical. > > The only purpose of the locking, in my proposal, is to ensure that an > exception occurs instead of a crash if the programmer screws up and > tries to resize an object whose internals are being messed with. It's > up to the programmer to do whatever is necessary to ensure that he > doesn't do that. > > > If extend() is called while thread 1 has the array locked, it can: > > > > A) raise an exception or return an error > > Yes. (Raise an exception.) > Which exception? Would you introduce a standard exception that should be raised when the user tries to do an operation that currently isn't allowed because the buffer is locked? Truthfully, now that Neil has given his explanation, I'm beginning to bend on this a bit. You're right in that it's not that much burden (however, it's more than *no* burden :-), and someone might find it useful. I still think it's going to be pretty uncommon, and I still believe the locking can be added on top of the simpler interface as needed. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From mhammond@skippinet.com.au Wed Jul 31 07:43:50 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 31 Jul 2002 16:43:50 +1000 Subject: [Python-Dev] Get fame and fortune from mindless editing Message-ID: An offer too good to refuse ;) We recently deprecated the DL_EXPORT and DL_IMPORT macros, replacing them with purpose oriented macros. In an effort to cleanup the source, it would be good to remove all such macros from the Python source tree. I have already made a start on this, and only mindless editing remains. What needs to be done is: * Modules/*.c - all 'DL_EXPORT(void)' references (which are all module init functions) are to be replaced with 'PyMODINIT_FUNC' - note no parens, and not no return type is specified. Eg, the following patch would be most suitable : Index: timemodule.c ... @@ -621,5 +621,5 @@ -DL_EXPORT(void) +PyMODINIT_FUNC inittime(void) { * Include/*.h - all public declarations need to be changed. All 'DL_IMPORT(type)' references, *including* any leading 'extern' declaration, should be changed to either PyAPI_FUNC (for functions) or PyAPI_DATA (for data) For example, the following 3 lines (from various .h files): extern DL_IMPORT(PyTypeObject) PyUnicode_Type; extern DL_IMPORT(PyObject*) PyUnicode_FromUnicode(...); DL_IMPORT(void) PySys_SetArgv(int, char **); would be changed to: PyAPI_DATA(PyTypeObject) PyUnicode_Type; PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(...); PyAPI_FUNC(void) PySys_SetArgv(int, char **); Note all 'extern' declarations were removed, and PyUnicode_Type is data (and declared as such) while the other 2 are functions. This is all mindless editing, suitable for a day when the brain doesn't quite seem to be firing! The fame comes from getting your name splashed all over the CVS logs. The fortune... well, not all valuable things can be measured in dollars . Thanks, Mark. From kalle@lysator.liu.se Wed Jul 31 09:19:57 2002 From: kalle@lysator.liu.se (Kalle Svensson) Date: Wed, 31 Jul 2002 10:19:57 +0200 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <20020731081957.GB1161@i92.ryd.student.liu.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Mark Hammond] > An offer too good to refuse ;) Right. http://python.org/sf/588982 Since this is my first post here, I'll introduce myself. I'm a first year student in computer engineering at Linköping University, Sweden. I've been lurking here for a few months. My primary Python interest at the moment is the Snake Farm project. Otherwise, I like Unix, free software and all that usual stuff. Peace, Kalle - -- Kalle Svensson, http://www.juckapan.org/~kalle/ Student, root and saint in the Church of Emacs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.6 iD8DBQE9R52GdNeA1787sd0RAjVwAJ9/c4y8Tq0lqf6tUfgGeaD2DZIV3QCfQAvh tBwRn/mmh52sFncmo3shxhg= =6Z3v -----END PGP SIGNATURE----- From mwh@python.net Wed Jul 31 09:22:22 2002 From: mwh@python.net (Michael Hudson) Date: 31 Jul 2002 09:22:22 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: "Mark Hammond"'s message of "Wed, 31 Jul 2002 15:28:44 +1000" References: Message-ID: <2mu1mgfgsh.fsf@starship.python.net> "Mark Hammond" writes: > > Michael Hudson : > > > > > My patch means the debugger doesn't stop > > > on the "def f():" line -- unsurprisingly, given that no execution ever > > > takes place on that line. > > [Greg] > > If there is no code there, there shouldn't be any > > need to stop there, should there? > > [Barry in a different message] > > I can't decide whether it would be good to stop on the def or not. > > Not doing so makes pdb act more like gdb, which also only stops on the > > first executable line, so maybe that's a good thing. > > IMO, the Python debugger "interface" should include function entry. There goes the time machine: it does. I just think everyone ignores 'call' messages because they're a bit redundant today (because of the matter under discussion). > The debugger UI (in this case pdb, but any other debugger) may > choose not to break there, but the debugger itself may be able to > implement some useful things by having the hook. bdb.Bdb.user_call(), I believe. Cheers, M. -- One of the great skills in using any language is knowing what not to use, what not to say. ... There's that simplicity thing again. -- Ron Jeffries From akim@epita.fr Wed Jul 31 10:11:11 2002 From: akim@epita.fr (Akim Demaille) Date: 31 Jul 2002 11:11:11 +0200 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: >>>>> "Fran=E7ois" =3D=3D Fran=E7ois Pinard wri= tes: Fran=E7ois> [Guido van Rossum] Hi Guido, Hi Francois ! >> Since we don't use this idiom, we can safely remove the >> -DHAVE_CONFIG_H (if we can find where it is set). >> I looked. It's generated by AC_OUTPUT. I don't think I can get >> rid of it. So never mind. :-) Fran=E7ois> Maybe AC_OUTPUT, or macros called by AC_OUTPUT, can be Fran=E7ois> overridden. If this is not easy to do, you might want to Fran=E7ois> discuss the matter with Akim, Cc:ed. Maybe he could tear Fran=E7ois> down AC_OUTPUT in parts so the overriding gets easier? Fran=E7ois> I know my friend Akim as good, helping and nice fellow! Fran=E7ois> Don't fear him! :-) I'm not sure I completely understand the question here: if HAVE_CONFIG_H is specified, it means config.h is created. So if you use a config.h, why does it matter not to define HAVE_CONFIG_H? From barry@python.org Wed Jul 31 13:15:49 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 31 Jul 2002 08:15:49 -0400 Subject: [Python-Dev] seeing off SET_LINENO References: <200207310003.g6V03tjm018993@kuku.cosc.canterbury.ac.nz> Message-ID: <15687.54517.580299.350054@anthem.wooz.org> >>>>> "MH" == Mark Hammond writes: MH> [Barry in a different message] >> I can't decide whether it would be good to stop on the def or >> not. Not doing so makes pdb act more like gdb, which also only >> stops on the first executable line, so maybe that's a good >> thing. MH> IMO, the Python debugger "interface" should include function MH> entry. The debugger UI (in this case pdb, but any other MH> debugger) may choose not to break there, but the debugger MH> itself may be able to implement some useful things by having MH> the hook. Good point. -Barry From thomas.heller@ion-tof.com Wed Jul 31 13:32:25 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 31 Jul 2002 14:32:25 +0200 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface References: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> Message-ID: <0ac201c2388e$53c0f020$e000a8c0@thomasnotebook> > Additional Notes/Comments > > Python strings, Unicode strings, mmap objects, and maybe other > types would expose the fixed buffer interface, but the array type > would *not*, because its memory block may be reallocated during > its lifetime. > Unfortunately it's impossible to implement the fixed buffer interface on mmap objects - the memory mapped file can be closed at any time. This would leave the pointers unusable. It seems this is another use case for locking - if we want it. Thomas From pinard@iro.umontreal.ca Wed Jul 31 13:41:02 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 31 Jul 2002 08:41:02 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: [Akim Demaille] > I'm not sure I completely understand the question here: if HAVE_CONFIG_H > is specified, it means config.h is created. So if you use a config.h, > why does it matter not to define HAVE_CONFIG_H? Hi, Akim. I hope life is still good to you! :-) In the beginnings of Autoconf, the `config.h' file did not exist. David MacKenzie added it as a way to reduce the `make' output clutter. Nowadays, I suspect almost all packages of at least moderate size uses it. Our traditional `lib/' modules have to work in many packages, whether `config.h' has been created or not, this being decided on a per package basis, and that is why there is a conditional inclusion of `config.h' in each of these `lib/' modules. He took a good while before we got stabilised on the exact stanza of this inclusion (I especially remember the massive unilateral changes by Roland McGrath introducing the BROKEN_BROKET define, or something like that, and all the doing it later took to clean this out.) Python (the distribution, which is what is in question here) does not use any of our `lib/' things, it is not going to use them, and it is not going to provide new such modules, so the distribution includes `config.h' everywhere, by permanent choice, without any need to use `HAVE_CONFIG_H' to decide if that inclusion is needed or not. So, even `-DHAVE_CONFIG_H' is useless `make' clutter in this case, and that's why the Python packagers wanted to get rid of it. In fact, in practice `-DHAVE_CONFIG_H' is only needed for packages using those common `lib/' modules, but many packages do not. Now that Autoconf is used with projects who have a life outside GNU, this is less necessary. Guido found, and got me to remember, that `@DEFS@' is the culprit: people just do not have to use it in their hand-crafted Makefiles, which is the case for Python. For away-from-GNU packages using Automake, some Automake option might exist so `@DEFS@' does not get generated? The only goal here is to get a cleaner `make' output. -- François Pinard http://www.iro.umontreal.ca/~pinard From skip@pobox.com Wed Jul 31 14:39:31 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 08:39:31 -0500 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <15687.59539.842887.296794@localhost.localdomain> Mark> I have already made a start on this, and only mindless editing Mark> remains. "mindless editing" ==> sed script or Emacs macros... ;-) Skip From skip@pobox.com Wed Jul 31 14:52:52 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 08:52:52 -0500 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <15687.60340.880139.545471@localhost.localdomain> Mark> We recently deprecated the DL_EXPORT and DL_IMPORT macros, Mark> replacing them with purpose oriented macros. In an effort to Mark> cleanup the source, it would be good to remove all such macros Mark> from the Python source tree. I modified the Modules/*.c and Includes/*.h files. Is there a patch/bug number I should attach the context diffs to for review? Skip From skip@pobox.com Wed Jul 31 14:59:09 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 08:59:09 -0500 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <15687.60717.533154.63118@localhost.localdomain> What about the references to DL_IMPORT/DL_EXPORT in Includes/Python.h and the two #ifndef DL_EXPORT lines in Modules/{cPickle.c,cStringIO.c}? Skip From guido@python.org Wed Jul 31 15:18:27 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 10:18:27 -0400 Subject: [Python-Dev] Re: HAVE_CONFIG_H In-Reply-To: Your message of "Wed, 31 Jul 2002 11:11:11 +0200." References: <200207291930.g6TJUYi05460@pcp02138704pcs.reston01.va.comcast.net> <200207301539.g6UFdUS09930@odiug.zope.com> <200207301622.g6UGMBl17143@odiug.zope.com> Message-ID: <200207311418.g6VEIRW32518@odiug.zope.com> > I'm not sure I completely understand the question here: if > HAVE_CONFIG_H is specified, it means config.h is created. So if you > use a config.h, why does it matter not to define HAVE_CONFIG_H? It's just clutter on the command line that we don't need. But never mind, I found a way to lose it already. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jul 31 15:36:37 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 10:36:37 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Tue, 30 Jul 2002 23:29:50 PDT." <20020731062950.59376.qmail@web40105.mail.yahoo.com> References: <20020731062950.59376.qmail@web40105.mail.yahoo.com> Message-ID: <200207311436.g6VEabH32668@odiug.zope.com> Based on the example of mmap (which can be closed at any time) I agree that the fixed buffer interface needs to have "get" and "release" methods (please pick better names). Maybe Thomas can update PEP 298. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Jul 31 16:16:20 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 10:16:20 -0500 Subject: [Python-Dev] imaplib test failure Message-ID: <15687.65348.589402.540281@localhost.localdomain> Anyone else seeing this? I doubt it's related to the DL_EXPORT/DL_IMPORT changes I was just testing, and my local copy of Lib/imaplib.py matches what's in CVS. Skip test test_imaplib produced unexpected output: ********************************************************************** *** lines 2-3 of actual output doesn't appear in expected output after line 1: + incorrect result when converting (2033, 5, 18, 3, 33, 20, 2, 138, 0) + incorrect result when converting '"18-May-2033 13:33:20 +1000"' ********************************************************************** From jeremy@alum.mit.edu Wed Jul 31 15:56:40 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 10:56:40 -0400 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <15687.64168.403730.225372@slothrop.zope.com> >>>>> "MH" == Mark Hammond writes: MH> An offer too good to refuse ;) We recently deprecated the MH> DL_EXPORT and DL_IMPORT macros, replacing them with purpose MH> oriented macros. In an effort to cleanup the source, it would MH> be good to remove all such macros from the Python source tree. Would it make any sense to backport the new macros to the 2.2 branch? It might ease the life of extension writers who want their code to work with either version. The practical problem, however, is that their code would only work with a too-be-released 2.2.2. Jeremy From barry@python.org Wed Jul 31 16:23:29 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 31 Jul 2002 11:23:29 -0400 Subject: [Python-Dev] imaplib test failure References: <15687.65348.589402.540281@localhost.localdomain> Message-ID: <15688.241.352958.223156@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> Anyone else seeing this? I doubt it's related to the SM> DL_EXPORT/DL_IMPORT changes I was just testing, and my local SM> copy of Lib/imaplib.py matches what's in CVS. Yes, everyone is: http://mail.python.org/pipermail/python-dev/2002-July/027056.html but no one's stepped up to the plate yet, including pierslauder <1.4 wink>. -Barry From jeremy@alum.mit.edu Wed Jul 31 16:25:37 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 31 Jul 2002 11:25:37 -0400 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: <200207311525.g6VFPRf00831@odiug.zope.com> References: <15687.64168.403730.225372@slothrop.zope.com> <200207311525.g6VFPRf00831@odiug.zope.com> Message-ID: <15688.369.568227.177521@slothrop.zope.com> >>>>> "GvR" == Guido van Rossum writes: MH> An offer too good to refuse ;) We recently deprecated the MH> DL_EXPORT and DL_IMPORT macros, replacing them with purpose MH> oriented macros. In an effort to cleanup the source, it would MH> be good to remove all such macros from the Python source tree. >> >> Would it make any sense to backport the new macros to the 2.2 >> branch? It might ease the life of extension writers who want >> their code to work with either version. The practical problem, >> however, is that their code would only work with a >> too-be-released 2.2.2. GvR> Maybe both the old and the new macros could be supported by GvR> 2.2.2? Yes. That's my suggestion. Jeremy From xscottg@yahoo.com Wed Jul 31 16:28:32 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 31 Jul 2002 08:28:32 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207311436.g6VEabH32668@odiug.zope.com> Message-ID: <20020731152832.99003.qmail@web40106.mail.yahoo.com> --- Guido van Rossum wrote: > > Based on the example of mmap (which can be closed at any time) I > agree that the fixed buffer interface needs to have "get" > and "release" methods (please pick better names). Maybe Thomas can > update PEP 298. > Wow, the tides have turned. Fair enough. I think Neil put forth the names "acquire" and "release". So how about typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* fixed buffer interface functions */ acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer; releasebufferproc bf_releasebuffer; } PyBufferProcs; Whatever the actual names, should there be a bf_releasereadbuffer and bf_releasewritebuffer? Or just the one bf_releasebuffer? Could also just have one acquire function that indicates whether it is read-write or read-only via a return parameter. Is write-only ever useful? Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From guido@python.org Wed Jul 31 16:25:27 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 11:25:27 -0400 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: Your message of "Wed, 31 Jul 2002 10:56:40 EDT." <15687.64168.403730.225372@slothrop.zope.com> References: <15687.64168.403730.225372@slothrop.zope.com> Message-ID: <200207311525.g6VFPRf00831@odiug.zope.com> > MH> An offer too good to refuse ;) We recently deprecated the > MH> DL_EXPORT and DL_IMPORT macros, replacing them with purpose > MH> oriented macros. In an effort to cleanup the source, it would > MH> be good to remove all such macros from the Python source tree. > > Would it make any sense to backport the new macros to the 2.2 branch? > It might ease the life of extension writers who want their code to > work with either version. The practical problem, however, is that > their code would only work with a too-be-released 2.2.2. Maybe both the old and the new macros could be supported by 2.2.2? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jul 31 16:37:07 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 11:37:07 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Wed, 31 Jul 2002 08:28:32 PDT." <20020731152832.99003.qmail@web40106.mail.yahoo.com> References: <20020731152832.99003.qmail@web40106.mail.yahoo.com> Message-ID: <200207311537.g6VFb7r01081@odiug.zope.com> > I think Neil put forth the names "acquire" and "release". So how about > > typedef struct { > getreadbufferproc bf_getreadbuffer; > getwritebufferproc bf_getwritebuffer; > getsegcountproc bf_getsegcount; > getcharbufferproc bf_getcharbuffer; > /* fixed buffer interface functions */ > acquirereadbufferproc bf_acquirereadbuffer; > acquirewritebufferproc bf_acquirewritebuffer; > releasebufferproc bf_releasebuffer; > } PyBufferProcs; > > Whatever the actual names, should there be a bf_releasereadbuffer and > bf_releasewritebuffer? Or just the one bf_releasebuffer? Just the one. > Could also just have one acquire function that indicates whether it > is read-write or read-only via a return parameter. That loses the (weak) symmetry with the existing API. > Is write-only ever useful? No, write implies read. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jul 31 16:47:46 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 11:47:46 -0400 Subject: [Python-Dev] What to do about the Wiki? Message-ID: <200207311547.g6VFlk601129@odiug.zope.com> I don't know what to do about the Moinmoin Wiki on python.org. Lots of useful information was recently moved to the Wiki, like the editors list and Andrew Kuchling's bookstore. But the Wiki brought the website down twice this weekend, by growing without bounds. To prevent this from happening again, we've disabled the Wiki, but that's not a solution. Juergen Hermann, Moinmoin's author, said he fixed a few things, but also said that Moinmoin is essentially vulnerable to "recursive wget" (e.g. someone trying to suck up the entire Wiki by following links). Apparently this is what brought the site down this weekend -- if I understand correctly, an in-memory log was growing too fast. There are a lot of links in the Wiki, e.g. for each Wiki page there's the page itself, the edit form, the history, various other actions, etc. I believe that Juergen has fixed the log-growing problem. Should we enable the Wiki again and hope for the best? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas.heller@ion-tof.com Wed Jul 31 16:49:20 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 31 Jul 2002 17:49:20 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020731152832.99003.qmail@web40106.mail.yahoo.com> <200207311537.g6VFb7r01081@odiug.zope.com> Message-ID: <0cd301c238a9$d5e3a690$e000a8c0@thomasnotebook> > > Could also just have one acquire function that indicates whether it > > is read-write or read-only via a return parameter. > > That loses the (weak) symmetry with the existing API. > There's nothing a client expecting a read/write pointer could do with a read only pointer IMO. > > Is write-only ever useful? > > No, write implies read. Should it be named getfixedreadwritebuffer then? Thomas From guido@python.org Wed Jul 31 16:54:41 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 11:54:41 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Wed, 31 Jul 2002 17:49:20 +0200." <0cd301c238a9$d5e3a690$e000a8c0@thomasnotebook> References: <20020731152832.99003.qmail@web40106.mail.yahoo.com> <200207311537.g6VFb7r01081@odiug.zope.com> <0cd301c238a9$d5e3a690$e000a8c0@thomasnotebook> Message-ID: <200207311554.g6VFsfO01268@odiug.zope.com> > > > Could also just have one acquire function that indicates whether it > > > is read-write or read-only via a return parameter. > > > > That loses the (weak) symmetry with the existing API. > > There's nothing a client expecting a read/write pointer could > do with a read only pointer IMO. So we agree that it's a bad idea to have one function. :-) > > > Is write-only ever useful? > > > > No, write implies read. > > Should it be named getfixedreadwritebuffer then? No, the existing API also uses getwritebuffer implying read/write. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Jul 31 16:57:08 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 10:57:08 -0500 Subject: [Python-Dev] Get fame and fortune from mindless editing In-Reply-To: References: Message-ID: <15688.2260.68645.786641@localhost.localdomain> Mark> * Modules/*.c - all 'DL_EXPORT(void)' references ... Mark> * Include/*.h - all public declarations need to be changed ... Context diff of these changes are attached to http://python.org/sf/566100 Regression tests pass on my Linux box. See my note for a couple caveats. Skip From thomas.heller@ion-tof.com Wed Jul 31 16:58:05 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 31 Jul 2002 17:58:05 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <20020731062950.59376.qmail@web40105.mail.yahoo.com> <200207311436.g6VEabH32668@odiug.zope.com> Message-ID: <0d2b01c238ab$0e892ff0$e000a8c0@thomasnotebook> From: "Guido van Rossum" > Based on the example of mmap (which can be closed at any time) I > agree that the fixed buffer interface needs to have "get" > and "release" methods (please pick better names). Maybe Thomas can > update PEP 298. The consequence: mmap objects need a 'buffer lock counter', and cannot be closed while the count is >0. Which exception is raised then? Or do you have something different in mind? The lock counter wouuld not be needed for strings and unicode... Thomas From guido@python.org Wed Jul 31 17:06:13 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 12:06:13 -0400 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: Your message of "Wed, 31 Jul 2002 17:58:05 +0200." <0d2b01c238ab$0e892ff0$e000a8c0@thomasnotebook> References: <20020731062950.59376.qmail@web40105.mail.yahoo.com> <200207311436.g6VEabH32668@odiug.zope.com> <0d2b01c238ab$0e892ff0$e000a8c0@thomasnotebook> Message-ID: <200207311606.g6VG6Ds01363@odiug.zope.com> > The consequence: mmap objects need a 'buffer lock counter', > and cannot be closed while the count is >0. Which exception > is raised then? Pick one -- mmap.error (== EnvironmentError) seems fine to me. Alternately, close() could set a "please close me" flag which causes the mmap file to be closed when the last release is called. Of course, the acquire method should raise an exception when it's already closed. > Or do you have something different in mind? > The lock counter wouuld not be needed for strings and unicode... And the array module could have one. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Jul 31 17:09:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 11:09:13 -0500 Subject: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: <200207311547.g6VFlk601129@odiug.zope.com> References: <200207311547.g6VFlk601129@odiug.zope.com> Message-ID: <15688.2985.118330.48738@localhost.localdomain> Guido> Juergen Hermann, Moinmoin's author, said he fixed a few thin= gs, Guido> but also said that Moinmoin is essentially vulnerable to Guido> "recursive wget" (e.g. someone trying to suck up the entire = Wiki Guido> by following links). Apparently this is what brought the si= te Guido> down this weekend -- if I understand correctly, an in-memory= log Guido> was growing too fast. I'm a bit confused by these statements. MoinMoin is a CGI script. I d= on't understand where "recursive wget" and "in-memory log" would come into p= lay. I recently fired up two Wikis on the Mojam server. I never see any long-running process which would suggest there's an in-memory log which= could grow without bound. The MoinMoin package does generate HTTP redirects, but while they might coax wget into firing off another reque= st, it should be handled by a separate MoinMoin process on the server side.= You should see the load grow significantly as the requests pour in, but shouldn't see any one MoinMoin process gobbling up all sorts of resourc= es. J=FCrgen, can you elaborate on these themes a little more? Guido> I believe that Juergen has fixed the log-growing problem. S= hould Guido> we enable the Wiki again and hope for the best? With an XS4ALL person at the ready? Perhaps someone can keep a window = open on creosote running something like while true ; do ps auxww | egrep python | sort -r -n -k 5,5 | head -1 =09sleep 15 done I'm running out for the next few hours. I'll be happy to run the while= loop when I return. Skip From webmaster@python.org Wed Jul 31 17:21:47 2002 From: webmaster@python.org (webmaster@python.org) Date: Wed, 31 Jul 2002 12:21:47 -0400 Subject: [Python-Dev] Re: What to do about the Wiki? References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> Message-ID: <15688.3739.1719.207581@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: Guido> I believe that Juergen has fixed the log-growing problem. Guido> Should we enable the Wiki again and hope for the best? I just did, by twiddling the +x bits on moinmoin SM> With an XS4ALL person at the ready? Perhaps someone can keep SM> a window open on creosote running something like | while true ; do | ps auxww | egrep python | sort -r -n -k 5,5 | head -1 | sleep 15 | done SM> I'm running out for the next few hours. I'll be happy to run SM> the while loop when I return. I'm doing this now, but even hitting the wiki it doesn't show up. I'm just going to run top for a while, but it's a fairly old version of top. :/ -Barry From guido@python.org Wed Jul 31 17:16:56 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 12:16:56 -0400 Subject: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: Your message of "Wed, 31 Jul 2002 11:09:13 CDT." <15688.2985.118330.48738@localhost.localdomain> References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> Message-ID: <200207311616.g6VGGuF01886@odiug.zope.com> > Guido> Juergen Hermann, Moinmoin's author, said he fixed a few things, > Guido> but also said that Moinmoin is essentially vulnerable to > Guido> "recursive wget" (e.g. someone trying to suck up the entire Wiki > Guido> by following links). Apparently this is what brought the site > Guido> down this weekend -- if I understand correctly, an in-memory log > Guido> was growing too fast. > > I'm a bit confused by these statements. MoinMoin is a CGI script. I don't > understand where "recursive wget" and "in-memory log" would come into play. > I recently fired up two Wikis on the Mojam server. I never see any > long-running process which would suggest there's an in-memory log which > could grow without bound. The MoinMoin package does generate HTTP > redirects, but while they might coax wget into firing off another request, > it should be handled by a separate MoinMoin process on the server side. You > should see the load grow significantly as the requests pour in, but > shouldn't see any one MoinMoin process gobbling up all sorts of resources. > Jürgen, can you elaborate on these themes a little more? Juergen seems offline or too busy to respond. Here's what he wrote on the matter. I guess he's reading the entire log into memory and updating it there. | Subject: [Pydotorg] wiki | From: Juergen Hermann | To: "pydotorg@python.org" | Date: Mon, 29 Jul 2002 20:32:31 +0200 | Hi! | | I looked into the wiki, and two things killed us: | | a) apart from google hits, some $!&%$""$% did a recursive wget. And the | wiki spans a rather wide uri space... | | b) the event log grows much faster than I'm used to, thus some | "simple" algorithms don't hold for this size. | | | Solutions: | | a) I just updated the wiki software, the current cvs contains a | robot/wget filter that forbids any access except to "view page" URIs | (i.e. we remain open to google, but no more open than absolutely | needed). If need be, we can forbid access altogether, or only allow | google. | | b) I'll install a cron job that rotates the logs, to keep them short. | | I shortened the logs manually for now. So if you all agree, we could | activate the wiki again. | | | Ciao, Jürgen Reading this again, I think we should give it a try again. > Guido> I believe that Juergen has fixed the log-growing problem. Should > Guido> we enable the Wiki again and hope for the best? > > With an XS4ALL person at the ready? Perhaps someone can keep a window open > on creosote running something like > > while true ; do > ps auxww | egrep python | sort -r -n -k 5,5 | head -1 > sleep 15 > done > > I'm running out for the next few hours. I'll be happy to run the while loop > when I return. We'll watch it here. I know who to write to have it rebooted. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Jul 31 17:43:40 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 12:43:40 -0400 Subject: [Python-Dev] imaplib test failure In-Reply-To: <15688.241.352958.223156@anthem.wooz.org> Message-ID: > Yes, everyone is: > > http://mail.python.org/pipermail/python-dev/2002-July/027056.html > > but no one's stepped up to the plate yet, including pierslauder <1.4 > wink>. I just reverted test_imaplib to rev 1.3, the last version that worked here. From mal@lemburg.com Wed Jul 31 18:02:51 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jul 2002 19:02:51 +0200 Subject: [Python-Dev] Re: What to do about the Wiki? References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <200207311616.g6VGGuF01886@odiug.zope.com> Message-ID: <3D48183B.7070306@lemburg.com> Guido van Rossum wrote: >> Guido> Juergen Hermann, Moinmoin's author, said he fixed a few thin= gs, >> Guido> but also said that Moinmoin is essentially vulnerable to >> Guido> "recursive wget" (e.g. someone trying to suck up the entire = Wiki >> Guido> by following links). Apparently this is what brought the si= te >> Guido> down this weekend -- if I understand correctly, an in-memory= log >> Guido> was growing too fast. >> >>I'm a bit confused by these statements. MoinMoin is a CGI script. I d= on't >>understand where "recursive wget" and "in-memory log" would come into p= lay. >>I recently fired up two Wikis on the Mojam server. I never see any >>long-running process which would suggest there's an in-memory log which >>could grow without bound. The MoinMoin package does generate HTTP >>redirects, but while they might coax wget into firing off another reque= st, >>it should be handled by a separate MoinMoin process on the server side.= You >>should see the load grow significantly as the requests pour in, but >>shouldn't see any one MoinMoin process gobbling up all sorts of resourc= es. >>J=FCrgen, can you elaborate on these themes a little more? >=20 >=20 > Juergen seems offline or too busy to respond. Here's what he wrote on > the matter. I guess he's reading the entire log into memory and > updating it there. J=FCrgen is talking about the file event.log which MoinMoin writes. This is not read into memory. New events are simply appended to the file. Now since the Wiki has recursive links such as the "LikePages" links on all pages and history links like the per page info screen, a recursive wget is likely to run for quite a while (even more because the URL level doesn't change much and thus probably doesn't trigger any depth restrictions on wget- like crawlers) and generate lots of events... What was the cause of the break down ? A full disk or a process claiming all resources ? > | Subject: [Pydotorg] wiki > | From: Juergen Hermann > | To: "pydotorg@python.org" > | Date: Mon, 29 Jul 2002 20:32:31 +0200 > | Hi! > |=20 > | I looked into the wiki, and two things killed us: > |=20 > | a) apart from google hits, some $!&%$""$% did a recursive wget. And t= he=20 > | wiki spans a rather wide uri space... > |=20 > | b) the event log grows much faster than I'm used to, thus some=20 > | "simple" algorithms don't hold for this size. > |=20 > |=20 > | Solutions:=20 > |=20 > | a) I just updated the wiki software, the current cvs contains a=20 > | robot/wget filter that forbids any access except to "view page" URIs=20 > | (i.e. we remain open to google, but no more open than absolutely=20 > | needed). If need be, we can forbid access altogether, or only allow=20 > | google. > |=20 > | b) I'll install a cron job that rotates the logs, to keep them short. > |=20 > | I shortened the logs manually for now. So if you all agree, we could=20 > | activate the wiki again. > |=20 > |=20 > | Ciao, J=FCrgen >=20 > Reading this again, I think we should give it a try again. --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From tim.one@comcast.net Wed Jul 31 18:07:46 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:07:46 -0400 Subject: [Pydotorg] Re: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: <3D48183B.7070306@lemburg.com> Message-ID: [M.-A. Lemburg] > What was the cause of the break down ? A full disk or a process > claiming all resources ? Thomas Wouters told me the process grew so large that it ran out of swapfile space. swapping-rumors-ly y'rs - tim From tim.one@comcast.net Wed Jul 31 18:16:20 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:16:20 -0400 Subject: [Python-Dev] Valgrinding Python In-Reply-To: <3D47491C.B0E9E165@metaslash.com> Message-ID: [Neal Norwitz] > This is good news. I changed ADDRESS_IN_RANGE to a function, > then suppressed it. There were no other uninitialized memory reads. Cool! In if (ADDRESS_IN_RANGE(p, pool->arenaindex)) { it's actually only the pool->arenaindex subexpression that may read uninitialized memory; the ADDRESS_IN_RANGE macro itself doesn't do anything "bad". > Valgrind does report a bunch of problems with pthreads, but > these are likely valgrind's fault. There are some complaints > about memory leaks, but these seem to appear only to occur > when spawning/threading. The leaks are small and short lived. A novel definition for "leak" . From guido@python.org Wed Jul 31 18:24:12 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 13:24:12 -0400 Subject: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: Your message of "Wed, 31 Jul 2002 19:02:51 +0200." <3D48183B.7070306@lemburg.com> References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <200207311616.g6VGGuF01886@odiug.zope.com> <3D48183B.7070306@lemburg.com> Message-ID: <200207311724.g6VHOCZ02434@odiug.zope.com> > > Juergen seems offline or too busy to respond. Here's what he wrote on > > the matter. I guess he's reading the entire log into memory and > > updating it there. > > Jürgen is talking about the file event.log which MoinMoin writes. > This is not read into memory. New events are simply appended to > the file. > > Now since the Wiki has recursive links such as the "LikePages" > links on all pages and history links like the per page > info screen, a recursive wget is likely to run for quite a > while (even more because the URL level doesn't change much > and thus probably doesn't trigger any depth restrictions on wget- > like crawlers) and generate lots of events... > > What was the cause of the break down ? A full disk or a process > claiming all resources ? A process running out of memory, AFAIK. I just ran a recursive wget on the Wiki, and it completed without bringing the site down, downloading about 1000 files (several views for each Wiki page). I didn't see the Wiki appear in the "top" display. So either Juergen fixed the problem (as he said he did) or there was a different cause. I do wish Juergen responded to his mail. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Jul 31 18:26:12 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:26:12 -0400 Subject: [Python-Dev] Now test_socket fails Message-ID: What's socket.socket() supposed to do without any arguments? Can't work on Windows, because socket.py has if (sys.platform.lower().startswith("win") or (hasattr(os, 'uname') and os.uname()[0] == "BeOS") or sys.platform=="riscos"): _realsocketcall = _socket.socket def socket(family, type, proto=0): return _socketobject(_realsocketcall(family, type, proto)) C:\Code\python\PCbuild>python ../lib/test/test_socket.py Testing for mission critical constants. ... ok Testing default timeout. ... ERROR Testing getservbyname(). ... ok Testing getsockopt(). ... ok Testing hostname resolution mechanisms. ... ok Making sure getnameinfo doesn't crash the interpreter. ... ok testNtoH (__main__.GeneralModuleTests) ... ok Testing reference count for getnameinfo. ... ok testing send() after close() with timeout. ... ok Testing setsockopt(). ... ok Testing getsockname(). ... ok Testing that socket module exceptions. ... ok Testing fromfd(). ... ok Testing receive in chunks over TCP. ... ok Testing recvfrom() in chunks over TCP. ... ok Testing large receive over TCP. ... ok Testing large recvfrom() over TCP. ... ok Testing sendall() with a 2048 byte string over TCP. ... ok Testing shutdown(). ... ok Testing recvfrom() over UDP. ... ok Testing sendto() and Recv() over UDP. ... ok Testing non-blocking accept. ... ok Testing non-blocking connect. ... ok Testing non-blocking recv. ... ok Testing whether set blocking works. ... ok Performing file readline test. ... ok Performing small file read test. ... ok Performing unbuffered file read test. ... ok ====================================================================== ERROR: Testing default timeout. ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_socket.py", line 273, in testDefaultTimeout s = socket.socket() TypeError: socket() takes at least 2 arguments (0 given) ---------------------------------------------------------------------- Ran 28 tests in 3.190s FAILED (errors=1) Traceback (most recent call last): File "../lib/test/test_socket.py", line 559, in ? test_main() File "../lib/test/test_socket.py", line 556, in test_main test_support.run_suite(suite) File "C:\CODE\PYTHON\lib\test\test_support.py", line 188, in run_suite raise TestFailed(err) test.test_support.TestFailed: Traceback (most recent call last): File "../lib/test/test_socket.py", line 273, in testDefaultTimeout s = socket.socket() TypeError: socket() takes at least 2 arguments (0 given) From tim.one@comcast.net Wed Jul 31 18:33:03 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:33:03 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: Message-ID: [me] > What's socket.socket() supposed to do without any arguments? > Can't work on Windows, because socket.py has ... Nevermind; I changed socket.py so this works as intended. From mgilfix@eecs.tufts.edu Wed Jul 31 18:37:11 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Wed, 31 Jul 2002 13:37:11 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: ; from tim.one@comcast.net on Wed, Jul 31, 2002 at 01:26:12PM -0400 References: Message-ID: <20020731133711.H26901@eecs.tufts.edu> I'm pretty sure that qualifies as a bug. The problem exists on linux as well (as a fresh cvs update has shown). In general though, the socket call should always take the two arguments. It seems at one point that the 2.3 version of the socket module accepted erroneously just a socket() call, while 2.2 does not. It seems Guido added these lines to integrate default timeout testing. If someone with write priveleges can just fix that to read: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) that should fix the problem. -- Mike On Wed, Jul 31 @ 13:26, Tim Peters wrote: > What's socket.socket() supposed to do without any arguments? Can't work on > Windows, because socket.py has > > if (sys.platform.lower().startswith("win") > or (hasattr(os, 'uname') and os.uname()[0] == "BeOS") > or sys.platform=="riscos"): > > _realsocketcall = _socket.socket > > def socket(family, type, proto=0): > return _socketobject(_realsocketcall(family, type, proto)) > > > C:\Code\python\PCbuild>python ../lib/test/test_socket.py > Testing for mission critical constants. ... ok > Testing default timeout. ... ERROR > Testing getservbyname(). ... ok > Testing getsockopt(). ... ok > Testing hostname resolution mechanisms. ... ok > Making sure getnameinfo doesn't crash the interpreter. ... ok > testNtoH (__main__.GeneralModuleTests) ... ok > Testing reference count for getnameinfo. ... ok > testing send() after close() with timeout. ... ok > Testing setsockopt(). ... ok > Testing getsockname(). ... ok > Testing that socket module exceptions. ... ok > Testing fromfd(). ... ok > Testing receive in chunks over TCP. ... ok > Testing recvfrom() in chunks over TCP. ... ok > Testing large receive over TCP. ... ok > Testing large recvfrom() over TCP. ... ok > Testing sendall() with a 2048 byte string over TCP. ... ok > Testing shutdown(). ... ok > Testing recvfrom() over UDP. ... ok > Testing sendto() and Recv() over UDP. ... ok > Testing non-blocking accept. ... ok > Testing non-blocking connect. ... ok > Testing non-blocking recv. ... ok > Testing whether set blocking works. ... ok > Performing file readline test. ... ok > Performing small file read test. ... ok > Performing unbuffered file read test. ... ok > > ====================================================================== > ERROR: Testing default timeout. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_socket.py", line 273, in testDefaultTimeout > s = socket.socket() > TypeError: socket() takes at least 2 arguments (0 given) > > ---------------------------------------------------------------------- > Ran 28 tests in 3.190s > > FAILED (errors=1) > Traceback (most recent call last): > File "../lib/test/test_socket.py", line 559, in ? > test_main() > File "../lib/test/test_socket.py", line 556, in test_main > test_support.run_suite(suite) > File "C:\CODE\PYTHON\lib\test\test_support.py", line 188, in run_suite > raise TestFailed(err) > test.test_support.TestFailed: Traceback (most recent call last): > File "../lib/test/test_socket.py", line 273, in testDefaultTimeout > s = socket.socket() > TypeError: socket() takes at least 2 arguments (0 given) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev `-> (tim.one) -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html From mgilfix@eecs.tufts.edu Wed Jul 31 18:38:12 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Wed, 31 Jul 2002 13:38:12 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: ; from tim.one@comcast.net on Wed, Jul 31, 2002 at 01:33:03PM -0400 References: Message-ID: <20020731133812.I26901@eecs.tufts.edu> Er, I'm not sure that was such a good idea. This doesn't work on linux and shouldn't. It never worked that way in 2.2 I'm not sure what happened to make it work in 2.3. Was prior to my adding the timeout socket changes. -- Mike On Wed, Jul 31 @ 13:33, Tim Peters wrote: > [me] > > What's socket.socket() supposed to do without any arguments? > > Can't work on Windows, because socket.py has ... > > Nevermind; I changed socket.py so this works as intended. > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev `-> (tim.one) -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html From guido@python.org Wed Jul 31 18:40:34 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 13:40:34 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: Your message of "Wed, 31 Jul 2002 13:26:12 EDT." References: Message-ID: <200207311740.g6VHeYS02538@odiug.zope.com> > What's socket.socket() supposed to do without any arguments? Can't work on > Windows, because socket.py has > > if (sys.platform.lower().startswith("win") > or (hasattr(os, 'uname') and os.uname()[0] == "BeOS") > or sys.platform=="riscos"): > > _realsocketcall = _socket.socket > > def socket(family, type, proto=0): > return _socketobject(_realsocketcall(family, type, proto)) Oops. It's supposed to default to AF_INET, SOCK_STREAM now. Can you test this patch and check it in if it works? *** socket.py 18 Jul 2002 17:08:34 -0000 1.22 --- socket.py 31 Jul 2002 17:35:25 -0000 *************** *** 62,68 **** _realsocketcall = _socket.socket ! def socket(family, type, proto=0): return _socketobject(_realsocketcall(family, type, proto)) if SSL_EXISTS: --- 62,68 ---- _realsocketcall = _socket.socket ! def socket(family=AF_INET, type=SOCK_STREAM, proto=0): return _socketobject(_realsocketcall(family, type, proto)) if SSL_EXISTS: (There's another change we should really make -- instead of a socket function, there should be a class socket whose constructor does the work. That's necessary so that isinstance(s, socket.socket) works on Windows; this currently works on Unix but not on Windows. But I don't have time for that now; the above patch should do what you need.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Jul 31 18:45:14 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 13:45:14 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: Your message of "Wed, 31 Jul 2002 13:37:11 EDT." <20020731133711.H26901@eecs.tufts.edu> References: <20020731133711.H26901@eecs.tufts.edu> Message-ID: <200207311745.g6VHjEC02589@odiug.zope.com> > It seems at one point that the 2.3 version of the socket module accepted > erroneously just a socket() call, while 2.2 does not. I added this intentionally. I am tired of typing (AF_INET, SOCK_STREAM) where those are the 99% case. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Jul 31 18:44:50 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:44:50 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: <20020731133711.H26901@eecs.tufts.edu> Message-ID: [Michael Gilfix] > I'm pretty sure that qualifies as a bug. The problem exists on linux > as well (as a fresh cvs update has shown). In general though, the > socket call should always take the two arguments. > > It seems at one point that the 2.3 version of the socket module > accepted erroneously just a socket() call, while 2.2 does not. It seems > Guido added these lines to integrate default timeout testing. If someone > with write priveleges can just fix that to read: > > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > > that should fix the problem. I'll leave this to you and Guido. The test works fine on Windows now. The docstring for _socket.socket claims that all arguments are optional. The code matches the docs: sock_initobj(PyObject *self, PyObject *args, PyObject *kwds) { PySocketSockObject *s = (PySocketSockObject *)self; SOCKET_T fd; int family = AF_INET, type = SOCK_STREAM, proto = 0; static char *keywords[] = {"family", "type", "proto", 0}; ALL ARGS ARE OPTIONAL HERE if (!PyArg_ParseTupleAndKeywords(args, kwds, "|iii:socket", keywords, &family, &type, &proto)) return -1; Py_BEGIN_ALLOW_THREADS fd = socket(family, type, proto); Py_END_ALLOW_THREADS From guido@python.org Wed Jul 31 18:47:34 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 31 Jul 2002 13:47:34 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: Your message of "Wed, 31 Jul 2002 13:38:12 EDT." <20020731133812.I26901@eecs.tufts.edu> References: <20020731133812.I26901@eecs.tufts.edu> Message-ID: <200207311747.g6VHlYr02626@odiug.zope.com> > Er, I'm not sure that was such a good idea. This doesn't work on > linux and shouldn't. It never worked that way in 2.2 I'm not sure what > happened to make it work in 2.3. Was prior to my adding the timeout > socket changes. What do you mean it doesn't work on Linux? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Jul 31 18:52:06 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 31 Jul 2002 13:52:06 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: <200207311740.g6VHeYS02538@odiug.zope.com> Message-ID: [Guido] > (There's another change we should really make -- instead of a socket > function, there should be a class socket whose constructor does the > work. That's necessary so that isinstance(s, socket.socket) works on > Windows; this currently works on Unix but not on Windows. http://www.python.org/sf/589262 From mgilfix@eecs.tufts.edu Wed Jul 31 18:57:06 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Wed, 31 Jul 2002 13:57:06 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: <200207311745.g6VHjEC02589@odiug.zope.com>; from guido@python.org on Wed, Jul 31, 2002 at 01:45:14PM -0400 References: <20020731133711.H26901@eecs.tufts.edu> <200207311745.g6VHjEC02589@odiug.zope.com> Message-ID: <20020731135705.J26901@eecs.tufts.edu> Sounds fair. Found it in the docs so I'm happy. On Wed, Jul 31 @ 13:45, Guido van Rossum wrote: > > It seems at one point that the 2.3 version of the socket module accepted > > erroneously just a socket() call, while 2.2 does not. > > I added this intentionally. I am tired of typing > (AF_INET, SOCK_STREAM) where those are the 99% case. -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html From mal@lemburg.com Wed Jul 31 18:56:49 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jul 2002 19:56:49 +0200 Subject: [Python-Dev] Re: What to do about the Wiki? References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <200207311616.g6VGGuF01886@odiug.zope.com> <3D48183B.7070306@lemburg.com> <200207311724.g6VHOCZ02434@odiug.zope.com> Message-ID: <3D4824E1.1090304@lemburg.com> Guido van Rossum wrote: >>>Juergen seems offline or too busy to respond. Here's what he wrote on >>>the matter. I guess he's reading the entire log into memory and >>>updating it there. >> >>J=FCrgen is talking about the file event.log which MoinMoin writes. >>This is not read into memory. New events are simply appended to >>the file. >> >>Now since the Wiki has recursive links such as the "LikePages" >>links on all pages and history links like the per page >>info screen, a recursive wget is likely to run for quite a >>while (even more because the URL level doesn't change much >>and thus probably doesn't trigger any depth restrictions on wget- >>like crawlers) and generate lots of events... >> >>What was the cause of the break down ? A full disk or a process >>claiming all resources ? >=20 >=20 > A process running out of memory, AFAIK. In that case, wouldn't it be better to impose a memoryuse limit on the user which Apache uses for dealing with CGI scripts ? That wouldn't solve any specific Wiki related problem, but prevents the server from going offline because of memory problems. > I just ran a recursive wget on the Wiki, and it completed without > bringing the site down, downloading about 1000 files (several views > for each Wiki page). I didn't see the Wiki appear in the "top" > display. >=20 > So either Juergen fixed the problem (as he said he did) or there was a > different cause. >=20 > I do wish Juergen responded to his mail. It's vacation time in Germany, so he may well be offline for a while. --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From mgilfix@eecs.tufts.edu Wed Jul 31 18:58:03 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Wed, 31 Jul 2002 13:58:03 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: <200207311747.g6VHlYr02626@odiug.zope.com>; from guido@python.org on Wed, Jul 31, 2002 at 01:47:34PM -0400 References: <20020731133812.I26901@eecs.tufts.edu> <200207311747.g6VHlYr02626@odiug.zope.com> Message-ID: <20020731135803.K26901@eecs.tufts.edu> On Wed, Jul 31 @ 13:47, Guido van Rossum wrote: > > Er, I'm not sure that was such a good idea. This doesn't work on > > linux and shouldn't. It never worked that way in 2.2 I'm not sure what > > happened to make it work in 2.3. Was prior to my adding the timeout > > socket changes. > > What do you mean it doesn't work on Linux? My fault. It works. I, uh, didn't set my path correctly :) -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html From mgilfix@eecs.tufts.edu Wed Jul 31 19:00:58 2002 From: mgilfix@eecs.tufts.edu (Michael Gilfix) Date: Wed, 31 Jul 2002 14:00:58 -0400 Subject: [Python-Dev] Now test_socket fails In-Reply-To: <200207311740.g6VHeYS02538@odiug.zope.com>; from guido@python.org on Wed, Jul 31, 2002 at 01:40:34PM -0400 References: <200207311740.g6VHeYS02538@odiug.zope.com> Message-ID: <20020731140057.L26901@eecs.tufts.edu> Would a little trick like this do? class socket: pass class unix_socket(socket): pass class windows_socket(socket): # Old windows stuff And then just do the namespace shuffling that's kinda already done in socket.py. -- Mike On Wed, Jul 31 @ 13:40, Guido van Rossum wrote: > (There's another change we should really make -- instead of a socket > function, there should be a class socket whose constructor does the > work. That's necessary so that isinstance(s, socket.socket) works on > Windows; this currently works on Unix but not on Windows. But I don't > have time for that now; the above patch should do what you need.) -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html From mal@lemburg.com Wed Jul 31 19:04:36 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 31 Jul 2002 20:04:36 +0200 Subject: [Python-Dev] Re: What to do about the Wiki? References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <200207311616.g6VGGuF01886@odiug.zope.com> <3D48183B.7070306@lemburg.com> <200207311724.g6VHOCZ02434@odiug.zope.com> <3D4824E1.1090304@lemburg.com> Message-ID: <3D4826B4.4060606@lemburg.com> M.-A. Lemburg wrote: > Guido van Rossum wrote: >>> What was the cause of the break down ? A full disk or a process >>> claiming all resources ? >> A process running out of memory, AFAIK. > > > In that case, wouldn't it be better to impose a memoryuse limit > on the user which Apache uses for dealing with CGI > scripts ? That wouldn't solve any specific Wiki related > problem, but prevents the server from going offline because > of memory problems. Here's how Apache can be configured for this (without having to fiddle with the Apache user account): http://httpd.apache.org/docs/mod/core.html#rlimitmem -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From thomas.heller@ion-tof.com Wed Jul 31 19:53:23 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 31 Jul 2002 20:53:23 +0200 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface References: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> <200207301946.g6UJkf520799@odiug.zope.com> Message-ID: <0fe601c238c3$8bab1b20$e000a8c0@thomasnotebook> I've changed PEP 298 to incorporate the latest changes. Barry has not yet run pep2html (and I don't want to bother him too much with this), also I don't know if it makes sense to post it again in its full length. So here is the link to view it online in text format: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/peps/pep-0298.txt?rev=1.4 and this is the checkin message: ----- The model exposed by the fixed buffer interface was changed: Retrieving a buffer from an object puts this in a locked state, and a releasebuffer function must be called to unlock the object again. Added releasefixedbuffer function slot, and renamed the get...fixedbuffer functions to acquire...fixedbuffer functions. Renamed the flag from Py_TPFLAG_HAVE_GETFIXEDBUFFER to Py_TPFLAG_HAVE_FIXEDBUFFER. (Is the 'fixed buffer' name still useful, or should we use 'static buffer' instead?) Added posting date (was posted to c.l.p and python-dev). ----- Thomas From skip@pobox.com Wed Jul 31 22:06:26 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 31 Jul 2002 16:06:26 -0500 Subject: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: <15688.3739.1719.207581@anthem.wooz.org> References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <15688.3739.1719.207581@anthem.wooz.org> Message-ID: <15688.20818.999604.113193@localhost.localdomain> BAW> I'm doing this now, but even hitting the wiki it doesn't show up. This is good. ;-) Skip From greg@cosc.canterbury.ac.nz Wed Jul 31 23:31:37 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Aug 2002 10:31:37 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <20020731063113.74481.qmail@web40110.mail.yahoo.com> Message-ID: <200207312231.g6VMVbt2019712@kuku.cosc.canterbury.ac.nz> > Moreover, if the sensible use cases for locking are few and far > between, then I'm still inclined to leave it out since you can add the > locking semantics at a different level. Are you sure about that? Without the locking, only non-resizable objects would be able to implement the protocol. So any higher level locking would have to be implemented on top of the old, non-safe version. Then you'd have to make sure that all parts of your application accessed the object through the extra layer. The "safe" part would be lost. > Your use of the word *no* is different than mine. :-) I could > similarly claim that the segment count puts no burden on > implementations that don't need it. I think I may have been replying to something other than what was said. But what I said is still true -- it imposes no extra burden on *implementers* of the interface which don't use the extra feature. I acknowledge that it complicates things slightly for *users* of the interface, but not as much as the seg count stuff does (there's no need for any testing or exception raising). > I believe it will be a no-op in enough places that extension writers > will do it wrong without even knowing. Well, there's not much that can be done about extension writers who fail to read the documentation, or wilfully ignore it. > Which exception? Would you introduce a standard exception that should > be raised when the user tries to do an operation that currently isn't > allowed because the buffer is locked? Maybe. It doesn't matter. The important thing is that the interpeter does not crash. > I still believe the locking can be added on top of the simpler > interface as needed. But it can't, since as I pointed out above, resizable objects won't be able to provide the simpler interface! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From ark@research.att.com Wed Jul 31 23:35:21 2002 From: ark@research.att.com (Andrew Koenig) Date: Wed, 31 Jul 2002 18:35:21 -0400 (EDT) Subject: [Python-Dev] split('') revisited Message-ID: <200207312235.g6VMZL218546@europa.research.att.com> Back in February, there was a thread in comp.lang.python (and, I think, also on Python-Dev) that asked whether the following behavior: >>> 'abcde'.split('') Traceback (most recent call last): File " ", line 1, in ? ValueError: empty separator was a bug or a feature. The prevailing opinion at the time seemed to be that there was not a sensible, unique way of defining this operation, so rejecting it was a feature. That answer didn't bother me particularly at the time, but since then I have learned a new fact (or perhaps an old fact that I didn't notice at the time) that has changed my mind: Section 4.2.4 of the library reference says that the 'split' method of a regular expression object is defined as Identical to the split() function, using the compiled pattern. This claim does not appear to be correct: >>> import re >>> re.compile('').split('abcde') ['abcde'] This result differs from the result of using the string split method. In other words, the documentation doesn't match the actual behavior, so the status quo is broken. It seems to me that there are four reasonable courses of action: 1) Do nothing -- the problem is too trivial to worry about. 2) Change string split (and its documentation) to match regexp split. 3) Change regexp split (and its documentation) to match string split. 4) Change both string split and regexp split to do something else :-) My first impulse was to argue that (4) is right, and that the behavior should be as follows >>> 'abcde'.split('') ['a', 'b', 'c', 'd', 'e'] >>> import re >>> re.compile('').split('abcde') ['a', 'b', 'c', 'd', 'e'] When this discussion came up last time, I think there was an objection that s.split('') was ambiguous: What argument is there in favor of 'abcde'.split('') being ['a', 'b', 'c', 'd', 'e'] instead of, say, ['', 'a', 'b', 'c', 'd', 'e', ''] or, for that matter, ['', 'a', '', 'b', '', 'c', '', 'd', '', 'e', '']? I made the counterargument that one could disambiguate by adding the rule that no element of the result could be equal to the delimiter. Therefore, if s is a string, s.split('') cannot contain any empty strings. However, looking at the behavior of regular expression splitting more closely, I become more confused. Can someone explain the following behavior to me? >>> re.compile('a|(x?)').split('abracadabra') ['', None, 'br', None, 'c', None, 'd', None, 'br', None, '']RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4