RetroSearch Browse

Wed Aug 24 20:00:45 CEST 2011 · https://mail.python.org/pipermail/python-dev/2011-August/113040.html

Le 24/08/2011 11:22, Glenn Linderman a écrit :
>>> c) mostly ASCII (utf8) with clever indexing/caching to be efficient
>>> d) UTF-8 with clever indexing/caching to be efficient
>> I see neither a need nor a means to consider these.
>
> The discussion about "mostly ASCII" strings seems convincing that there
> could be a significant space savings if such were implemented.

Antoine's optimization in the UTF-8 decoder has been removed. It doesn't 
change the memory footprint, it is just slower to create the Unicode object.

When you decode an UTF-8 string:

  - "abc" string uses "latin1" (8 bits) units
  - "aé" string uses "latin1" (8 bits) units <= cool!
  - "a€" string uses UCS2 (16 bits) units
  - "a\U0010FFFF" string uses UCS4 (32 bits) units

Victor

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2011-August/113040.html below:

[Python-Dev] PEP 393 Summer of Code Project