> I added a test case to Lib/test/string_tests.py that uses a sequence > that returns the wrong answer from __len__. I've used this test in a > number of places to make sure the interpreter doesn't dump core when > it hits a bad user-defined sequence. > > class Sequence: > def __init__(self): self.seq = 'wxyz' > def __len__(self): return len(self.seq) > def __getitem__(self, i): return self.seq[i] > > class BadSeq2(Sequence): > def __init__(self): self.seq = ['a', 'b', 'c'] > def __len__(self): return 8 > > The test of string.join and " ".join don't dump core, but they do > raise an IndexError. I wonder if that's the right thing to do, > because the other places where it is handled no exception is raised. > > The question boils down to the semantics of the sequence protocol. > > The string code defintion is: > if __len__ returns X, then the length is X > thus, __getitem__ should succeed for range(0, X) > if it doesn't, raise an IndexError > > The other code (e.g. PySequence_Tuple) definition is: > if __len__ return X, then the length is <= X > if __getitem__ succeeds for range(0, X), then length is indeed X > if it does not, then length is Y + 1 for highest Y > where Y is greatest index that actually works > > The definition in PySequence_Tuple seemed quite clever when I first > saw it, but I like it less now. If a user-defined sequence raises > IndexError when len indicates it should not, the code is broken. The > attempt to continue anyway is masking an error in user code. > > I vote for fixing PySequence_Tuple and the like to raise an > IndexError. I'm not sure I agree. When Steve Majewski proposed variable-length sequences, we ended up conceding that __len__ is just a hint. The actual length can be longer or shorter. The map and filter functions allow this, and so do min/max and others that go over sequences, and even (of course) the for loop. (In fact, the preferred behavior is not to call __len__ at all but just try x[0], x[1], x[2], ... until IndexError is hit.) If I read your description of PySequence_Tuple(), it accepts a __len__ that overestimates but not one that understestimates. That's wrong. (In Majewski's example, a tar file wrapper would claim to have 0 items but iterating over it in ascending order would access all the items in the file. Claiming some arbitrary integer as __len__ would be wrong.) So string.join(BadSeq2(), "") or "".join(BadSeq2()) should return "abc". --Guido van Rossum (home page: http://dinsdale.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4