A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/1999-November/001189.html below:

[Python-Dev] Internationalization Toolkit

[Python-Dev] Internationalization ToolkitTim Peters tim_one@email.msn.com
Fri, 12 Nov 1999 00:42:32 -0500
[MAL]
> If HP approves, I'd propose to use UTF-16 as if it were UCS-2 and
> signal failure of this assertion at Unicode object construction time
> via an exception. That way we are within the standard, can use
> reasonably fast code for Unicode manipulation and add those extra 1M
> character at a later stage.

I think this is reasonable.

Using UTF-8 internally is also reasonable, and if it's being rejected on the
grounds of supposed slowness, that deserves a closer look (it's an ingenious
encoding scheme that works correctly with a surprising number of existing
8-bit string routines as-is).  Indexing UTF-8 strings is greatly speeded by
adding a simple finger (i.e., store along with the string an index+offset
pair identifying the most recent position indexed to -- since string
indexing is overwhelmingly sequential, this makes most indexing
constant-time; and UTF-8 can be scanned either forward or backward from a
random internal point because "the first byte" of each encoding is
recognizable as such).

I expect either would work well.  It's at least curious that Perl and Tcl
both went with UTF-8 -- does anyone think they know *why*?  I don't.  The
people here saying UCS-2 is the obviously better choice are all from the
Microsoft camp <wink>.  It's not obvious to me, but then neither do I claim
that UTF-8 is obviously better.





RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4