Alexander Belopolsky writes: > Yet finding a bug in a str object method after a 5 min review was a > bit discouraging: > > >>> 'xyz'.center(20, '\U00010140') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: The fill character must be exactly one character long > > Given the apparent difficulty of writing even basic text processing > algorithms in presence of surrogate pairs, I wonder how wise it is to > expose Python users to them. "Consenting adults" applies here. What to do? Write tests, fix the stdlib. Raise the probability of surrogate pair tests in the fuzzer. But "expose the users to surrogate pairs in an efficient (ie, UCS-2) implementation" is a fundamental design principle of Python. Tightening up the internal implementation is -10 unacceptable IMO YMMV. > Again, given that the str object itself has at least one non-BMP > character bug as we are closing on the third major release of py3k, > how likely are 3rd party developers to get their libraries right as > they port to 3.x? Not our problem, really. We need to fix the stdlib, but 3rd party libraries know what they're doing. I guess we could provide a fuzztest module that generates known nasty data (zero, very big numbers, "\0x00", "\U00010140", etc) that people would be able to plug in as a data source for their own code. Of course that doesn't replace conventional unittests based on analysis of edge cases and tests designed to tickle them, but it would be a start for many projects.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4