A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2004-September/048901.html below:

Unicode history (was Alternative Impl. for PEP 292)

[Python-Dev] OT: Unicode history (was Alternative Impl. for PEP 292)François Pinard pinard at iro.umontreal.ca
Tue Sep 14 21:15:28 CEST 2004
[alloydflanagan at comcast.net]
> [François Pinard]

> >>Many people consider that Unicode, or UTF-8 at least, is strongly
> >>favouring English (boldly American) over any other script or
> >>language.  If it has not been so, Americans would never have
> >>promoted it so much, and would have rather shown an infinite and
> >>eternal reluctance...

> To be fair to the developers of Unicode, I'd suggest that the issue
> is not favoring (note spelling! :) ) English, but rather keeping
> compatibility with an enormous amount of existing data which was
> encoded in ASCII.

Of course, this is the standard and official reason.  Yet, the net
effect of that concern and constraint, noticed by many foreigners, is
that Unicode favours English.  (About "favouring" spelling, I find it
amusing to spell-check my out-going email with a British dictionary.)

> Which was an English standard, but you can only do so much in 7
> bits...  As for American reluctance, how are you going to convince
> anyone to double (at least) the storage requirements for their data,
> to support languages they never use?  That would have cost a great
> deal of money.

I would not think money has to be expressed in term of storage.  Storage
considerations are more likely a justification than an explanation for
the reluctance.  UTF-8 is such that on disk, and for applications using
UTF-8 internally (there are a few), not a single bit is spent on extra
storage for English.  There are cases, and the current Python approach
is one of them, Unicode may be made to be fairly unobtrusive on memory
consumption, at least in English contexts.

The complexity added by Unicode, however, may undoubtedly be a concern,
for any implementor wanting to really address that standard, that is,
further than merely toying with 16-bit characters.  *This* means human
time, and this is where the real cost lies.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4