> > Guido van Rossum wrote: > > > > > > The email below is a serious bug report. A quick analysis shows that > > > UserString.count() calls the count() method on a string object, which > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > format code truncates integers. It probably should raise an overflow > > > exception instead. But that would still cause the test to fail -- > > > just in a different way (more explicit). Then the string methods > > > should be fixed to use long ints instead -- and then something else > > > would probably break... > > MAL wrote: > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > together with integers, so there's no problem on that side > > of the fence ;-) > > > > Since strings and Unicode objects use integers to describe the > > length of the object (as well as most if not all other > > builtin sequence types), the correct default value should > > thus be something like sys.maxlen which then gets set to > > INT_MAX. > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > re.py and sre_parse.py accordingly. > Guido wrote: > Hm, I'm not so sure. It would be much better if passing sys.maxint > would just WORK... Since that's what people have been doing so far. > Possible solutions (I give 4 of them): 1. The 'i' format code could raise an overflow exception and the PyArg_ParseTuple() call in string_count() could catch it and truncate to INT_MAX (reasoning that any overflow of the end position of a string can be bound to INT_MAX because that is the limit for any string in Python). Pros: - This "would just WORK" for usage of sys.maxint. Cons: - This overflow exception catching should then reasonably be propagated to other similar functions (like string.endswith(), etc). - We have to assume that the exception raised in the PyArg_ParseTuple(args, "O|ii:count", &subobj, &i, &last) call is for the second integer (i.e. 'last'). This is subtle and ugly. Pro or Con: - Do we want to start raising overflow exceptions for other conversion formats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 where sizeof(long) < size(void*))? I think this is a good idea in principle but may break code (even if it *does* identify bugs in that code). 2. Just change the definitions of the UserString methods to pass a variable length argument list instead of default value parameters. For example change UserString.count() from: def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) to: def count(self, *args)): return self.data.count(*args) The result is that the default value for 'end' is now set by string_count() rather than by the UserString implementation: >>> from UserString import UserString >>> s= 'abcabcabc' >>> u = UserString('abcabcabc') >>> s.count('abc') 3 >>> u.count('abc') 3 Pros: - Easy change. - Fixes the immediate bug. - This is a safer way to copy the string behaviour in UserString anyway (is it not?). Cons: - Does not fix the general problem of the (common?) usage of sys.maxint to mean INT_MAX rather than the actual LONG_MAX (this matters on 64-bit Unices). - The UserString code is no longer really self-documenting. 3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) with breaks the logical difference with sys.maxint (set to LONG_MAX): - sys.maxint == "the largest value a Python integer can hold" - sys.maxlen == "the largest value for the length of an object in Python (e.g. length of a string, length of an array)" Pros: - More explicit in that it separates two distinct meanings for sys.maxint (which now makes a difference on 64-bit Unices). - The code changes should be fairly straightforward. Cons: - Places in the code that still use sys.maxint where they should use sys.maxlen will unknowingly be overflowing ints and bringing about this bug. - Something else for coders to know about. 4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_t type). It is probably not a biggie, but Python currently makes the assumption that string never exceed INT_MAX in length. While this assumption is not likely to be proven false it technically could be on 64-bit systems. As well, when you start compiling on Win64 (where sizeof(int) == sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundreds of warnings about implicit casts from size_t (64-bits) to int (32-bits) for every strlen, str*, fwrite, and sizeof call that you make. Pros: - IMHO logically more correct. - Might clean up some subtle bugs. - Cleans up annoying and disconcerting warnings. - Will probably mean less pain down the road as 64-bit systems (esp. Win64) become more prevalent. Cons: - Lot of coding changes. - As Guido said: "and then something else would probably break". (Though, on currently 32-bits system, there should be no effective change). Only 64-bit systems should be affected and, I would hope, the effect would be a clean up. I apologize for not being succinct. Note that I am volunteering here. Opinions and guidance please. Trent
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4