[MAL] > I wonder how we could add %-formatting to Unicode strings without > duplicating the PyString_Format() logic. > > First, do we need Unicode object %-formatting at all ? Sure -- in the end, all the world speaks Unicode natively and encodings become historical baggage. Granted I won't live that long, but I may last long enough to see encodings become almost purely an I/O hassle, with all computation done in Unicode. > Second, here is an emulation using strings and <default encoding> > that should give an idea of one could work with the different > encodings: > > s = '%s %i abcäöü' # a Latin-1 encoded string > t = (u,3) What's u? A Unicode object? Another Latin-1 string? A default-encoded string? How does the following know the difference? > # Convert Latin-1 s to a <default encoding> string via Unicode > s1 = unicode(s,'latin-1').encode() > > # The '%s' will now add u in <default encoding> > s2 = s1 % t > > # Finally, convert the <default encoding> encoded string to Unicode > u1 = unicode(s2) I don't expect this actually works: for example, change %s to %4s. Assuming u is either UTF-8 or Unicode, PyString_Format isn't smart enough to know that some (or all) characters in u consume multiple bytes, so can't extract "the right" number of bytes from u. I think % formating has to know the truth of what you're doing. > Note that .encode() defaults to the current setting of > <default encoding>. > > Provided u maps to Latin-1, an alternative would be: > > u1 = unicode('%s %i abcäöü' % (u.encode('latin-1'),3), 'latin-1') More interesting is fmt % tuple where everything is Unicode; people can muck with Latin-1 directly today using regular strings, so the example above mostly shows artificial convolution.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4