A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2014-June/134732.html below:

[Python-Dev] Internal representation of strings and Micropython

[Python-Dev] Internal representation of strings and MicropythonChris Angelico rosuav at gmail.com
Wed Jun 4 12:51:36 CEST 2014
On Wed, Jun 4, 2014 at 8:38 PM, Paul Sokolovsky <pmiscml at gmail.com> wrote:
> That's another reason why people don't like Unicode enforced upon them
> - all the talk about supporting all languages and scripts is demagogy
> and hypocrisy, given a choice, Unicode zealots would rather limit
> people to Latin script then give up on their arbitrarily chosen,
> one-among-thousands,
> soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding.

Wrong. I use and recommend Unicode, with UTF-8 for transmission, and I
do not ever want to limit people to Latin-1 or any other such subset.
Even though English is the only language I speak, I am *frequently*
using non-ASCII characters (eg when I discuss mathematics on a MUD),
and if I could be absolutely sure that everyone in the conversation
correctly comprehended Unicode, I could do this with a lot more
confidence. Unfortunately, the server I use just passes bytes in and
out, and some clients assume CP-1252, others assume Latin-1, and
others (including my Gypsum) try UTF-8 first and fall back on an
eight-bit encoding (currently CP-1252 because of the first group). But
in an ideal world, server and clients would all speak Unicode
everywhere, and transmit and receive UTF-8. This is not hypocrisy,
this is the way to work reliably.

> Once again, my claim is what MicroPython implements now is more correct
> - in a sense wider than technical - handling. We don't provide Unicode
> encoding support, because it's highly bloated, but let people use any
> encoding they like. That comes at some price, like length of strings in
> characters are not know to runtime, only in bytes, but quite a lot of
> applications can be written by having just that.

The current implementation is flat-out lying, actually. It claims that
it's storing Unicode codepoints (as per the Python spec) while
actually storing bytes, and then it transmits those bytes to the
console etc as-is. This is a bug. It needs to be fixed. The only
question is, what form will the fix take? Will it be PEP 393's
flexible fixed-width representation? UTF-8? UTF-16 (I hope not!)? A
hybrid of Latin-1 where possible and UTF-8 otherwise? But something
has to be done.

ChrisA
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4