> BTW, playing around with some of this it seems to me that the > inability to just copy.copy (or copy.deepcopy) anything produced by > iter(sequence) is more of a bother -- quite apart from clonability > (a similar but separate concept), couldn't those iterators be > copy'able anyway? I.e. just expose underlying sequence and index as > their state for getting and setting? I'm not sure why you say it's separate from cloning; it seems to me that copy.copy(iter(range(10))) should return *exactly* what we'd want the proposed clone operation to return. > Otherwise to get copyable > iterators I have to reimplement iter "by hand": > > class Iter(object): > def __init__(self, seq): > self.seq = seq > self.idx = 0 > def __iter__(self): return self > def next(self): > try: result = self.seq[self.idx] > except IndexError: raise StopIteration > self.idx += 1 > return result > > and I don't understand the added value of requiring the user to > code this no-added-value, slow-things-down boilerplate. I see this as a plea to add __copy__ and __deepcopy__ methods to all standard iterators for which it makes sense. (Or maybe only __copy__ -- I'm not sure what value __deepcopy__ would add.) I find this a reasonable request for the iterators belonging to stndard containers (list, tuple, dict). I guess that some of the iterators in itertools might also support this easily. Perhaps this would be the road to supporting iterator cloning? > > > An iterator that knows it's coming from disk or pipe can provide > > > that disk copy (or reuse the existing file) as part of its > > > "optimized tee-ability". > > > > At considerable cost. > > I'm not sure I see that cost, yet. Mostly complexity of the code to implement it, and things like making sure that the disk file is deleted (not an easy problem cross-platform!). > > lines of a file previously available. Also, on many systems, > > every call to fseek() drops the stdio buffer, even if the seek > > position is not actually changed by the call. It could be done, > > but would require incredibly hairy code. > > The call to fseek probably SHOULD drop the buffer in a typical > C implementation _on a R/W file_, because it's used as the way > to signal the file that you're moving from reading to writing or VV > (that's what the C standard says: you need a seek between an > input op and an immediately successive output op or viceversa, > even a seek to the current point, else, undefined behavior -- which > reminds me, I don't know if the _Python_ wrapper maintains that > "clever" requirement for ITS R/W files, but I think it does). Yes it does: file_seek() calls drop_readahead(). > I can well believe that for simplicity a C-library implementor would > then drop the buffer on a R/O file too, needlessly but > understandably. For any stdio implementation supporting fileno(), fseek() is also used to synch up the seek positions maintained by stdio and by the underlying OS or file descriptor implementation. > The deuced "for line in flob:" is so deucedly optimized that trying > to compete with it, even with something as apparently trivial as > Lines1, is apparently a lost cause;-). OK, then I guess that an > iterator by lines on a textfile can't easily be optimized for teeability > by these "share the file object" strategies; rather, the best way to > tee such a disk file would seem to be: > def tee_diskfile(f): > result = file(f.name, f.mode) > result.seek(f.tell()) > return f, result Right, except you might want to change the mode to a read-only mode (without losing the 'b' or 'U' property). --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4