RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/1999-November/001272.html below:

[Python-Dev] Unicode proposal: %-formatting ?

[Python-Dev] Unicode proposal: %-formatting ?Tim Peters tim_one@email.msn.com
Tue, 16 Nov 1999 00:38:32 -0500

Previous message: [Python-Dev] Unicode proposal: %-formatting ?
Next message: [Python-Dev] Unicode proposal: %-formatting ?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[MAL]
> I wonder how we could add %-formatting to Unicode strings without
> duplicating the PyString_Format() logic.
>
> First, do we need Unicode object %-formatting at all ?

Sure -- in the end, all the world speaks Unicode natively and encodings
become historical baggage.  Granted I won't live that long, but I may last
long enough to see encodings become almost purely an I/O hassle, with all
computation done in Unicode.

> Second, here is an emulation using strings and <default encoding>
> that should give an idea of one could work with the different
> encodings:
>
>     s = '%s %i abcäöü' # a Latin-1 encoded string
>     t = (u,3)

What's u?  A Unicode object?  Another Latin-1 string?  A default-encoded
string?  How does the following know the difference?

>     # Convert Latin-1 s to a <default encoding> string via Unicode
>     s1 = unicode(s,'latin-1').encode()
>
>     # The '%s' will now add u in <default encoding>
>     s2 = s1 % t
>
>     # Finally, convert the <default encoding> encoded string to Unicode
>     u1 = unicode(s2)

I don't expect this actually works:  for example, change %s to %4s.
Assuming u is either UTF-8 or Unicode, PyString_Format isn't smart enough to
know that some (or all) characters in u consume multiple bytes, so can't
extract "the right" number of bytes from u.  I think % formating has to know
the truth of what you're doing.

> Note that .encode() defaults to the current setting of
> <default encoding>.
>
> Provided u maps to Latin-1, an alternative would be:
>
>     u1 = unicode('%s %i abcäöü' % (u.encode('latin-1'),3), 'latin-1')

More interesting is fmt % tuple where everything is Unicode; people can muck
with Latin-1 directly today using regular strings, so the example above
mostly shows artificial convolution.

Previous message: [Python-Dev] Unicode proposal: %-formatting ?
Next message: [Python-Dev] Unicode proposal: %-formatting ?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4