> Martin, haven't you read my last post to Guido ? I've read http://www.python.org/pipermail/python-dev/2000-September/016162.html where you express a preference of disabling the getreadbuf slot, in addition to special-casing Unicode objects in s#. I've just tested the effects of your solution 1 on the test suite. Or are you referring to a different message? > Completely disabling getreadbuf is not a solution worth considering -- > it breaks far too much code which the test suite doesn't even test, > e.g. MarkH's win32 stuff produces tons of Unicode object which > then can get passed to potentially all of the stdlib. The test suite > doesn't check these cases. Do you have any specific examples of what else would break? Looking at all occurences of 's#' in the standard library, I can't find a single case where the current behaviour would be right - in all cases raising an exception would be better. Again, any counter-examples? > Special case Unicode in getargs.c's code for "s#" only and leave > getreadbuf enabled. "s#" could then return the default encoded > value for the Unicode object while SRE et al. could still use > PyObject_AsReadBuffer() to get at the raw data. I think your option 2 is acceptable, although I feel the option 1 would expose more potential problems. What if an application unknowingly passes a unicode object to md5.update? In testing, it may always succeed as ASCII-only data is used, and it will suddenly start breaking when non-ASCII strings are entered by some user. Using the internal rep would also be wrong in this case - the md5 hash would depend on the byte order, which is probably not desired (*). In any case, your option 2 would be a big improvement over the current state, so I'll just shut up. Regards, Martin (*) BTW, is there a meaningful way to define md5 for a Unicode string?
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4