SYNOPSIS: a slight adjustment to the definition of consume() yields a simple solution that addresses both the destruction issue and the multiple-iteration issue, without introducing any new syntax. On Mon, 22 Jul 2002, Greg Ewing wrote: > As someone pointed out, it's pretty rare that you actually *want* to > consume the sequence. Usually the choice is between "I don't care" and > "The sequence must NOT be consumed". Sure, i'll go for that. What i'm after is the ability to say "i would like this sequence not to be consumed." > Of the two varieties of for-loop in your proposal, for-in > obviously corresponds to the "must not be consumed" case, > leading one to suppose that you intend for-from to be used in > the don't-care case. Right. > But now you seem to be suggesting that library routines > should always use for-in, and that the caller should > convert an iterator to a sequence if he knows it's okay > to consume it: The two are semantically equivalent proposals. I explained them both in the original message that i posted proposing the solution. The 'consume()' library routine is just another way to express 'for-from' without using new syntax. However, it is true that 'consume()' is more generally useful. It would be good to have, whether or not we had new syntax. I acknowledge that i did not realize this at the time i wrote the earlier message, or i would have stated the 'consume()' (then called 'seq()') proposal first and the for-from proposal second, instead of the opposite. That is why i am sticking to talking about the no-new-syntax version of the proposal for now. I apologize if it seems that i am asking you to follow a moving target. I would like you to recognize, though, that the underlying concept is the same -- the programmer has to signal when an iterator is being used like a sequence. > Okay, that seems reasonable -- explicit is better than > implicit. But... consider the following two library > routines: > > def printout1(s): > for x in s: > print x > > def printout2(s): > for x in s: > for y in s: > print x, y [...] > no exception will be raised if you call printout2(consume(s)) > by mistake. Good point! Clearly my proposal did not take care of this case. (But there are solutions below; read on.) Upon some reflection, though, it seems to me that this problem is orthogonal to the proposal: forcing the programmer to declare when destruction is allowed neither solves nor exacerbates the problem of printout2(). consume() is about destruction, whereas printout2() is about multiple iteration. > To get any safety benefit from your proposed arrangement, > it seems to me that you'd need to write printout1 as > > def printout1(s): > "s must be an iterator" > for x from s: > print x I'm afraid i don't see how this bears on the problem you just described. It still would not be possible to write a safe version of printout2() in either (a) the world of the current Python with iterators or (b) a world where for-in does not accept iterators and consume() has been introduced. One real solution to this problem is what Oren has been suggesting all along -- raise an IteratorExhausted exception if you try to fetch an element from an iterator that has already thrown StopIteration. In printout2(), this exception would occur on the second time through the inner loop. This works, but we can do even better. After some thought today, i realized that there is a second solution. Thanks for leading me to it, Greg! With consume(), the programmer has declared that the iterator is okay to destroy. But my definition of consume() was incomplete. One slight change solves the problem: consume(y) returns x such that iter(x) returns y the first time, and raises IteratorConsumedException thereafter. Now we're all set! If consume(it) is passed to printout2(), an exception is raised immediately before any damage is done. This detects whether you attempt to *start* the iterator twice, which makes more sense than detecting whether you hit the *end* of the iterator twice. The insight is that protection against multiple iteration belongs in the implementation of __iter__, not in the iterator itself -- because the iterator doesn't know whether it can be restarted. The *provider* of the iterator does. > There's no doubt that it's very elegant theoretically, > but in thinking through the implications, I'm not sure it > would be all that helpful in practice, and might even > turn out to be a nuisance if it requires putting in a > lot of iter(x) and/or consume(x) calls. It's not so bad. You only have to say iter() or consume() in exceptional cases, where you are specifically writing code to manipulate iterators. Everything else looks the same -- except it's safe. More importantly, neither iter() nor consume() need to be taught on the first day of Python. I think it all comes together quite nicely. Here it is in summary: - Iterators just implement __next__. - Containers, and other things that want to be iterated over, just implement __iter__. - The new built-in routine consume(y) returns x such that iter(x) returns y the first time, and raises IteratorConsumedException thereafter. - (Other objects that only allow one-shot iteration can also raise IteratorConsumedException when their __iter__ is called twice.) Advantages: 1. "for-in" and "in" are safe to use -- no fear of destruction. 2. One-shot iterators are safe against multiple iteration. 3. Iterators don't have to implement a dummy __iter__ method returning self. 4. The implementation of "for" stays exactly as it is now. 5. Current implementations of iterators continue to work fine, if unsafely (but they're already unsafe). 6. No new syntax. 7. For-loops continue to work on containers exactly as they always have. 8. Iterators don't have to maintain extra state to know that it's time to start throwing IteratorExhausted instead of StopIteration. Items 1, 2, and 3 are distinct improvements over the current state of affairs. The only inconvenience is the case where an iterator is being passed to a routine that expects a container; this is still pretty rare yet, and this situation is easy to detect (hence, the error message from "for" can explain what to do). In this case, you have to wrap consume() around the iterator to declare it okay to consume. And that's all. The fact that it takes only a slight adjustment to the earlier proposal to solve *both* the destruction problem and the multiple-iteration problem has led me to be even more convinced that this is the "right answer" -- in the sense that this is how i would design the protocol if we were starting from scratch. Now, i know we are not starting from scratch. And i know Guido has already said he doesn't want to solve this problem. But, just in case you are wondering, the migration path from here to there seems pretty straightforward to me: 1. When __next__() is not present, call next() and issue a warning. 2. In the next version, deprecate next() in favour of __next__(). 3. Add consume() and IteratorConsumedException to built-ins. 4. Deprecate the dummy __iter__() method on iterators. 5. Throw a party and consume(mass_quantities). -- ?!ng "Most things are, in fact, slippery slopes. And if you start backing off from one thing because it's a slippery slope, who knows where you'll stop?" -- Sean M. Burke
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4