Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. > > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was > a binary string containing whatever they had wanted it to contain. Well, the design is to always coerce to Unicode when 8-bit string objects and Unicode objects meet. This is done for all string methods and that's the reason I'm also implementing this for %-formatting (internally this is just another string method). > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) True; but Guido's intention was to have strings and Unicode interoperate without too much user intervention. > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? All string methods automatically coerce to Unicode when they see a Unicode argument, e.g. " ".join(("abc", u"def")) will return u"abc def". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4