A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2017-May/147950.html below:

[Python-Dev] Format strings, Unicode, and Py2.7: need clarification

[Python-Dev] Format strings, Unicode, and Py2.7: need clarificationSteven D'Aprano steve at pearwood.info
Wed May 17 20:41:12 EDT 2017
On Wed, May 17, 2017 at 02:41:29PM -0700, Craig Rodrigues wrote:

> e = "{}".format(u"hi")
[...]
> type(e) == str

> The confusion for me is why is type(e) of type str, and not unicode?

I think that's one of the reasons why the Python 2.7 string model is (1) 
convenient to those using purely ASCII, but (2) ultimately broken.

You can see why it's broken if you do this:

py> "{}".format(u"hiµ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in 
position 2: ordinal not in range(128)


So it tries to encode the Unicode string to ASCII, and if that succeeds, 
format returns a byte str. I'm not sure if that was a deliberate design 
choice for format, or just a side-effect of it calling str() on its 
arguments by default.

I'm not sure if I've answered your question or not. Are you looking for 
justification of this misfeature, or an explanation of the historical 
reasons why it exists, or something else?


(If you're looking for the same behaviour in Python 3 and 2.7, probably 
the best thing you can do is just religiously use unicode strings u'' in 
both. You might try:

from __future__ import unicode_literals

in 2.7, but I'm not sure that's enough.)


-- 
Steve
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4