This is a multi-part message in MIME format. --------------D181A42EF954A5F1E101C9DB Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Guido van Rossum wrote: > > Paul Prescod wrote: > > > I don't think of iterators as indexing in terms of numbers. Otherwise I > > could do this: > > > > >>> a={0:"zero",1:"one",2:"two",3:"three"} > > >>> for i in a: > > ... print i > > ... > > > > So from a Python user's point of view, for-looping has nothing to do > > with integers. From a Python class/module creator's point of view it > > does have to do with integers. I wouldn't be either surprised nor > > disappointed if that changed one day. > > Bingo! > > I've long had an idea for generalizing 'for' loops using iterators. This is > more a Python 3000 thing, but I'll explain it here anyway because I think > it's relevant. Perhaps this should become a PEP? (Maybe we should have a > series of PEPs with numbers in the 3000 range for Py3k ideas?) > > The statement > > for <variable> in <object>: <block> > > should translate into this kind of pseudo-code: > > # variant 1 > __temp = <object>.newiterator() > while 1: > try: <variable> = __temp.next() > except ExhaustedIterator: break > <block> > > or perhaps (to avoid the relatively expensive exception handling): > > # variant 2 > __temp = <object>.newiterator() > while 1: > __flag, <variable = __temp.next() > if not __flag: break > <block> > > In variant 1, the next() method returns the next object or raises > ExhaustedIterator. In variant 2, the next() method returns a tuple (<flag>, > <variable>) where <flag> is 1 to indicate that <value> is valid or 0 to > indicate that there are no more items available. I'm not crazy about the > exception, but I'm even less crazy about the more complex next() return > value (careful observers may have noticed that I'm rarely crazy about flag > variables :-). > > Another argument for variant 1 is that variant 2 changes what <variable> is > after the loop is exhausted, compared to current practice: currently, it > keeps the last valid value assigned to it. Most likely, the next() method > returns None when the sequence is exhausted. It doesn't make a lot of sense > to require it to return the last item of the sequence -- there may not *be* > a last item, if the sequence is empty, and not all sequences make it > convenient to keep hanging on to the last item in the iterator, so it's best > to specify that next() returns (0, None) when the sequence is exhausted. > > (It would be tempting to suggeste a variant 1a where instead of raising an > exception, next() returns None when the sequence is exhausted, but this > won't fly: you couldn't iterate over a list containing some items that are > None.) How about a third variant: #3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block> This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem). Another possibility would be using an iterator attribute to get at the variable: #4: __iter = <object>.iterator() while 1: if not __iter.next(): break <variable> = __iter.value <block> > Side note: I believe that the iterator approach could actually *speed up* > iteration over lists compared to today's code. This is because currently the > interation index is a Python integer object that is stored on the stack. > This means an integer add with overflow check, allocation, and deallocation > on each iteration! But the iterator for lists (and other basic sequences) > could easily store the index as a C int! (As long as the sequence's length > is stored in an int, the index can't overflow.) You might want to check out the counterobject.c approach I used to speed up the current for-loop in Python 1.5's ceval.c: it's basically a mutable C integer which is used instead of the current Python integer index. The details can be found in my old patch: http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz > [Warning: thinking aloud ahead!] > > Once we have the concept of iterators, we could support explicit use of them > as well. E.g. we could use a variant of the for statement to iterate over an > existing iterator: > > for <variable> over <iterator>: <block> > > which would (assuming variant 1 above) translate to: > > while 1: > try: <variable> = <iterator>.next() > except ExhaustedIterator: break > <block> > > This could be used in situations where you have a loop iterating over the > first half of a sequence and a second loop that iterates over the remaining > items: > > it = something.newiterator() > for x over it: > if time_to_start_second_loop(): break > do_something() > for x over it: > do_something_else() > > Note that the second for loop doesn't reset the iterator -- it just picks up > where the first one left off! (Detail: the x that caused the break in the > first loop doesn't get dealt with in the second loop.) > > I like the iterator concept because it allows us to do things lazily. There > are lots of other possibilities for iterators. E.g. mappings could have > several iterator variants to loop over keys, values, or both, in sorted or > hash-table order. Sequences could have an iterator for traversing them > backwards, and a few other ones for iterating over their index set (cf. > indices()) and over (index, value) tuples (cf. irange()). Files could be > their own iterator where the iterator is almost the same as readline() > except it raises ExhaustedIterator at EOF instead of returning "". A tree > datastructure class could have an associated iterator class that maintains a > "finger" into the tree. > > Hm, perhaps iterators could be their own iterator? Then if 'it' were an > iterator, it.newiterator() would return a reference to itself (not a copy). > Then we wouldn't even need the 'over' alternative syntax. Maybe the method > should be called iterator() then, not newiterator(), to avoid suggesting > anything about the newness of the returned iterator. > > Other ideas: > > - Iterators could have a backup() method that moves the index back (or > raises an exception if this feature is not supported, e.g. when reading data > from a pipe). > > - Iterators over certain sequences might support operations on the > underlying sequence at the current position of the iterator, so that you > could iterate over a sequence and occasionally insert or delete an item (or > a slice). FYI, I've attached a module which I've been using a while for iteration. The code is very simple and implements the #4 variant described above. > Of course iterators also connect to generators. The basic list iterator > doesn't need coroutines or anything, it can be done as follows: > > class Iterator: > def __init__(self, seq): > self.seq = seq > self.ind = 0 > def next(self): > if self.ind >= len(self.seq): raise ExhaustedIterator > val = self.seq[self.ind] > self.ind += 1 > return val > > so that <list>.iterator() could just return Iterator(<list>) -- at least > conceptually. > > But for other data structures the amount of state needed might be > cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much > easier to code that using a recursive Icon-style generator than by using an > explicit stack. On the other hand, I remember reading an article a while ago > (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive > solutions are very inefficient, and that an explicit stack (1) is really not > that hard to code, and (2) gives much more control over the memory and time > consumption of the code. On the third hand, some forms of iteration really > *are* expressed much more clearly using recursion. On the fourth hand, I > disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root > of all iteration. Anyway, I believe that iterators (as explained above) can > be useful even if we don't have generators (in the Icon sense, which I > believe means coroutine-style). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------D181A42EF954A5F1E101C9DB Content-Type: text/python; charset=us-ascii; name="Iterator.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Iterator.py" """ Generic object iterators. (c) Copyright Marc-Andre Lemburg; All Rights Reserved. See the documentation for further information on copyrights, or contact the author (mal@lemburg.com). """ import exceptions class Error(exceptions.StandardError): pass class Iterator: value = None # Current value def __init__(self,obj,startpos=0): self.obj = obj self.index = startpos self.startpos = startpos self.step = 1 def set(self,pos): self.index = self.startpos = pos self.value = self.obj[pos] def reset(self): self.index = pos = self.startpos self.value = self.obj[pos] def next(self): self.index = i = self.index + self.step if i < 0: return 0 try: self.value = self.obj[i] return 1 except IndexError: return 0 def prev(self): self.index = i = self.index - self.step if i < 0: return 0 try: self.value = self.obj[i] return 1 except IndexError: return 0 def __getitem__(self,i): if i != 0: self.index = i = self.index + self.step else: i = self.index if i < 0: raise IndexError self.value = v = self.obj[i] return v ForwardIterator = Iterator class BackwardIterator(Iterator): def __init__(self,obj,startpos=None): self.obj = obj if startpos is not None: self.index = startpos else: self.index = startpos = len(obj) - 1 self.startpos = startpos self.value = self.obj[startpos] self.step = -1 class CallIterator(Iterator): def __init__(self,obj): self.obj = obj self.index = 0 def set(self,pos): raise Error,'.set() not supported' def reset(self): raise Error,'.reset() not supported' def next(self): self.index = self.index + 1 try: v = self.obj() if not v: return 0 self.value = v return 1 except IndexError: return 0 def prev(self): raise Error,'.prev() not supported' def __getitem__(self,i): self.index = self.index + 1 v = self.obj() if not v: raise IndexError self.value = v return v def _test(): i = BackwardIterator(range(1,10)) for x in i: print x print i.reset() while 1: print i.value if not i.next(): break print filereader = CallIterator(open('/usr/dict/words').readline) for line in filereader: pass print 'Read %i lines' % filereader.index if __name__ == '__main__': _test() --------------D181A42EF954A5F1E101C9DB--
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4