Here is a summary of the whole iterator picture as i currently see it. This is necessarily subjective, but i will try to be precise so that it's clear where i'm making a value judgement and where i'm trying to state fact, and so we can pinpoint areas where we agree and disagree. In the subjective sections, i have marked with [@] the places where i solicit agreement or disagreement. I would like to know your opinions on the issues listed below, and on the places marked [@]. Definitions (objective) ----------------------- Container: a thing that provides non-destructive access to a varying number of other things. Why "non-destructive"? Because i don't expect that merely looking at the contents will cause a container to be altered. For example, i expect to be able to look inside a container, see that there are five elements; leave it alone for a while, come back to it later and observe once again that there are five elements. Consequently, a file object is not a container in general. Given a file object, you cannot look at it to see if it contains an "A", and then later look at it once again to see if it contains an "A" and get the same result. If you could seek, then you could do this, but not all files support seeking. Even if you could seek, the act of reading the file would still alter the file object. The file object provides no way of getting at the contents without mutating itself. According to my definition, it's fine for a container to have ways of mutating itself; but there has to be *some* way of getting the contents without mutating the container, or it just ain't a container to me. A file object is better described as a stream. Hypothetically one could create an interface to seekable files that offered some non-mutating read operations; this would cause the file to look more like an array of bytes, and i would find it appropriate to call that interface a container. Iterator: a thing that you can poke (i.e. send a no-argument message), where each time you poke it, it either yields something or announces that it is exhausted. For an iterator to mutate itself every time you poke it is not part of my definition. But the only non-mutating iterator would be an iterator that returns the same thing forever, or an iterator that is always exhausted. So most iterators usually mutate. Some iterators are associated with a container, but not all. There can be many kinds of iterators associated with a container. The most natural kind is one that yields the elements of the container, one by one, mutating itself each time it is poked, until it has yielded all of the elements of the container and announces exhaustion. A Container's Natural Iterator: an iterator that yields the elements of the container, one by one, in the order that makes the most sense for the container. If the container has a finite size n, then the iterator can be poked exactly n times, and thereafter it is exhausted. Issues (objective) ------------------ I alluded to a set of issues in an earlier message, and i'll begin there, by defining what i meant more precisely. The Destructive-For Issue: In most languages i can think of, and in Python for the most part, a statement such as "for x in y: print x" is a non-destructive operation on y. Repeating "for x in y: print x" will produce exactly the same results once more. For pre-iterator versions of Python, this fails to be true only if y's __getitem__ method mutates y. The introduction of iterators has caused this to now be untrue when y is any iterator. The issue is, should "for" be non-destructive? The Destructive-In Issue: Notice that the iteration that takes place for the "in" operator is implemented in the same way as "for". So if "for" destroys its second operand, so will "in". The issue is, should "in" be non-destructive? (Similar issues exist for built-ins that iterate, like list().) The __iter__-On-Iterators Issue: Some people have mentioned that the presence of an __iter__() method is a way of signifying that an object supports the iterator protocol. It has been said that this is necessary because the presence of a "next()" method is not sufficiently distinguishing. Some have said that __iter__() is a completely distinct protocol from the iterator protocol. The issue is, what is __iter__() really for? And secondarily, if it is not part of the iterator protocol, then should we require __iter__() on iterators, and why? The __next__-Naming Issue: The iteration method is currently called "next()". Previous candidates for the name of this method were "next", "__next__", and "__call__". After some previous debate, it was pronounced to be "next()". There are concerns that "next()" might collide with existing methods named "next()". There is also a concern that "next()" is inconsistent because it is the only type-slot-method that does not have a __special__ name. The issue is, should it be called "next" or "__next__"? My Positions (subjective) ------------------------- I believe that "for" and "in" and list() should be non-destructive. I believe that __iter__() should not be required on iterators. I believe that __next__() is a better name than next(). Destructive-For, Destructive-In: I think "for" should be non-destructive because that's the way it has almost always behaved, and that's the way it behaves in any other language [@] i can think of. For a container's __getitem__ method to mutate the container is, in my opinion, bad behaviour. In pre-iterator Python, we needed some way to allow the convenience of "for" on user-implemented containers. So "for" supported a special protocol where it would call __getitem__ with increasing integers starting from 0 until it hit an IndexError. This protocol works great for sequence-like containers that were indexable by integers. But other containers had to be hacked somewhat to make them fit. For example, there was no good way to do "for" over a dictionary-like container. If you attempted "for" over a user-implemented dictionary, you got a really weird "KeyError: 0", which only made sense if you understood that the "for" loop was attempting __getitem__(0). (Hey! I just noticed that from UserDict import UserDict for k in UserDict(): print k still produces "KeyError: 0"! This oughta be fixed...) If you wanted to support "for" on something else, sometimes you would have to make __getitem__ mutate the object, like it does in the fileinput module. But then the user has to know that this object is a special case: "for" only works the first time. When iterators were introduced, i believed they were supposed to solve this problem. Currently, they don't. Currently, "in" can even be destructive. This is more serious. While one could argue that it's not so strange for for x in y: ... to alter y (even though i do think it is strange), i believe just about anyone would find it very counterintuitive for if x in y: to alter y. [@] __iter__-On-Iterators: I believe __iter__ is not a type flag. As i argued previously, i think that looking for the presence of methods that don't actually implement a protocol is a poor way to check for protocol support. And as things stand, the presence of __iter__ doesn't even work [@] as a type flag. There are objects with __iter__ that are not iterators (like most containers). And there are objects without __iter__ that work as iterators. I know you can legislate the latter away, but i think such legislation would amount to fighting the programmers -- and it is infeasible [@] to enforce the presence of __iter__ in practice. Based on Guido's positive response, in which he asked me to make an addition to the PEP, i believe Guido agrees with me that __iter__ is distinct from the protocol of an iterator. This surprised me because it runs counter to the philosophy previously expressed in the PEP. Now suppose we agree that __iter__ and next are distinct protocols. Then why require iterators to support both? The only reason we would want __iter__ on iterators is so that we can use "for" [@] with an iterator as the second operand. I have just argued, above, that it's *not* a good idea for "for" and "in" to be destructive. Since most iterators self-mutate, it follows that it's not advisable to use an iterator directly as the second operand of a "for" or "in". I realize this seems radical! This may be the most controversial point i have made. But if you accept that "in" should not destroy its second argument, the conclusion is unavoidable. __next__-Naming: I think the potential for collision, though small, is significant, and this makes "__next__" a better choice than "next". A built-in function next() should be introduced; this function would call the tp_iternext slot, and for instance objects tp_iternext would call the __next__ method implemented in Python. The connection between this issue and the __iter__ issue is that, if next() were renamed to __next__(), the argument that __iter__ is needed as a flag would also go away. The Current PEP (objective) --------------------------- The current PEP takes the position that "for" and "in" can be destructive; that __iter__() and next() represent two distinct protocols, yet iterators are required to support both; and that the name of the method on iterators is called "next()". My Ideal Protocol (subjective) ------------------------------ So by now the biggest question/objection you probably have is "if i can't use an iterator with 'for', then how can i use it?" The answer is that "for" is a great way to iterate over things; it's just that it iterates over containers and i want to preserve that. We need a different way to iterate over iterators. In my ideal world, we would allow a new form of "for", such as for line from file: print line The use if "from" instead of "in" would imply that we were (destructively) pulling things out of the iterator, and would remove any possible parallel to the test "x in y", which should rightly remain non-destructive. Here's the whole deal: - Iterators provide just one method, __next__(). - The built-in next() calls tp_iternext. For instances, tp_iternext calls __next__. - Objects wanting to be iterated over provide just one method, __iter__(). Some of these are containers, but not all. - The built-in iter(foo) calls tp_iter. For instances, tp_iter calls __iter__. - "for x in y" gets iter(y) and uses it as an iterator. - "for x from y" just uses y as the iterator. That's it. Benefits: - We have a nice clean division between containers and iterators. - When you see "for x in y" you know that y is a container. - When you see "for x from y" you know that y is an iterator. - "for x in y" never destroys y. - "if x in y" never destroys y. - If you have an object that is container-like, you can add an __iter__ method that gives its natural iterator. If you want, you can supply more iterators that do different things; no problem. No one using your object is confused about whether it mutates. - If you have an object that is cursor-like or stream-like, you can safely make it into an iterator by adding __next__. No one using your object is confused about whether it mutates. Other notes: - Iterator algebra still works fine, and is still easy to write: def alternate(it): while 1: yield next(it) next(it) - The file problem has a consistent solution. Instead of writing "for line in file" you write for line from file: print line Being forced to write "from" signals to you that the file is eaten up. There is no expectation that "for line from file" will work again. The best would be a convenience function "readlines", to make this even clearer: for line in readlines("foo.txt"): print line Now you can do this as many times as you want, and there is no possibility of confusion; there is no file object on which to call methods that might mess up the reading of lines. My Not-So-Ideal Protocol ------------------------ All right. So new syntax may be hard to swallow. An alternative is to introduce an adapter that turns an iterator into something that "for" will accept -- that is, the opposite of iter(). - The built-in seq(it) returns x such that iter(x) yields it. Then instead of writing for x from it: you would write for x in seq(it): and the rest would be the same. The use of "seq" here is what would flag the fact that "it" will be destroyed. -- ?!ng
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4