Guido van Rossum wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Makes sense. But note that it's going to be difficult to catch all > cases: you could have > > '...%d...%s...%s...' % (3, "abc", u"abc") > > and > > '...%(foo)s...' % {'foo': u'abc'} > > and even > > '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} > > (the latter should *not* convert to Unicode). No problem... :-) Its a simple fix: once %s in an 8-bit string sees a Unicode object it will stop processing the string and restart using the unicode formatting algorithm. This will cost performance, of course. Optimization is easy though: add a small "u" in front of the string ;-) A sample session: >>> '...%(foo)s...' % {'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",'def':123} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",u'def':123} u'...abc...' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4