Bill Tutt wrote: > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s....=20 > >=20 > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > >=20 > > Thoughts ? >=20 > Suddenly returning a Unicode string from an operation that was an = 8-bit > string is likely to give some code exterme fits of despondency. why is this different from returning floating point values from operations involving integers and floats? > Converting to UTF-8 didn't give you any data loss, however it = certainly > might be unexpected to now find UTF-8 characters in what the user = originally > thought was a binary string containing whatever they had wanted it to = contain. the more I've played with this, the stronger my opinion that the "now it's an ordinary string, now it's a UTF-8 string, now it's an ordinary string again" approach doesn't work. more on this in a later post. (am I the only one here that has actually tried to write code that handles both unicode strings and ordinary strings? if not, can anyone tell me what I'm doing wrong?) > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the = data. > They might want to do a codepage translation, or something else. (aka = Hey, > here's a bug I just found for you!) > In what other cases are you suddenly returning a Unicode string object = from > which previouslly returned a string object? if unicode is ever to be a real string type in python, and not just a nifty extension type, it must be okay to return a unicode string from any operation that involves a unicode argument... </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4