> There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Suddenly returning a Unicode string from an operation that was an 8-bit string is likely to give some code exterme fits of despondency. Converting to UTF-8 didn't give you any data loss, however it certainly might be unexpected to now find UTF-8 characters in what the user originally thought was a binary string containing whatever they had wanted it to contain. Throwing an exception would at the very least force the user to make a decision one way or the other about what they want to do with the data. They might want to do a codepage translation, or something else. (aka Hey, here's a bug I just found for you!) In what other cases are you suddenly returning a Unicode string object from which previouslly returned a string object? Bill
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4