> My question was more directed toward non-performance issues. Do we > really have *any* need for more than two iterators running concurrently? > After all, it's already difficult to come-up with good use cases for two > that are not dominated by list() or window(). > > > > With the current tee(), I was thinking that if the two iterators stay > > close, you end up moving the in basket to the out basket rather > > frequently, and the overhead of that might beat the simplicity of the > > linked lists. > > With current tee(), runtime is dominated by calls to Append and Pop > (reverse is super-fast and moves each element only once). Those are > calls are more expensive than a link jump; however append() and pop() > are optimized to avoid calls to the memory manager while every link > would need steps for alloc/initialization/reference/dealloc. Cache > effects are also important because the current tee() uses much less > memory and the two memory blocks are contiguous. > > > > > Also, *if* you need a lot of clones, using multiple > > tee() calls ends up creating several queues, again causing more > > overhead. (These queues end up together containing all the items from > > the oldest to the newest iterator.) > > *If* we want to support multiple clones, there is an alternate > implementation of the current tee that only costs one extra word per > iteration. That would be in there already. I really *wanted* a > multi-way tee but couldn't find a single use case that warranted it. All points well taken. > > I also note that the current tee() doesn't let you use __copy__ easily > > (it would be quite messy I think). > > To __copy__ is to tee. Both make two iterators from one. > They are different names for the same thing. > Right now, they don't seem comparable because the current tee is only a > two way split and you think of copy as being a multi-way split for no > extra cost. Here I respectfully differ. When you tee, you have to stop using the underlying iterator, and replace it with one of the tee'ed copies. When you __copy__, you can continue to use the original. The difference matters if you're tee'ing an iterator owned by another piece of code. > > Maybe Andrew has some use cases? > > I hope so. I can't think of anything that isn't dominated by list(), > window(), or the current tee(). > > And, if needed, the current tee() can easily be made multi-way. It > doubles the unit memory cost from one word to two but that's nothing > compared to the link method (two words for PyHead, another 3 (Linux) or > 4 (Windows) words for GC, and another 3 for the data fields). As you said in your first msg, you could do it with much less overhead if the link cell wasn't made a PyObject. Also, Andrew's suggestion of using a link cell containing an array of values could be explored. But I'll happily back off until we find a use case that needs more than a 2-way tee *and* we find it's a performance bottleneck for your approach. We may never find that. > > The question is, how often does one need this? Have you seen real use > > cases for tee() that aren't better served with list()? > > I'm sure they exist, but they are very few. I was hoping that a simple, > fast, memory efficient two-way tee() would have satisfied all the > requests, but this thing appears to be taking on a life of its own with > folks thinking they need multiple concurrent iterators created by a > magic method (what for?). Well, there was a separate thread about __copy__'ing iterators. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4