RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/1999-November/001156.html below:

[Python-Dev] Internationalization Toolkit

[Python-Dev] Internationalization ToolkitTim Peters tim_one@email.msn.com
Thu, 11 Nov 1999 01:49:16 -0500

Previous message: [Python-Dev] Internationalization Toolkit
Next message: [Python-Dev] Internationalization Toolkit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ Greg Stein]
> ...
> Things will be a lot faster if we have a fixed-size character. Variable
> length formats like UTF-8 are a lot harder to slice, search, etc.

The initial byte of any UTF-8 encoded character never appears in a
*non*-initial position of any UTF-8 encoded character.  Which means
searching is not only tractable in UTF-8, but also that whatever optimized
8-bit clean string searching routines you happen to have sitting around
today can be used as-is on UTF-8 encoded strings.  This is not true of UCS-2
encoded strings (in which "the first" byte is not distinguished, so 8-bit
search is vulnerable to finding a hit starting "in the middle" of a
character).  More, to the extent that the bulk of your text is plain ASCII,
the UTF-8 search will run much faster than when using a 2-byte encoding,
simply because it has half as many bytes to chew over.

UTF-8 is certainly slower for random-access indexing, including slicing.

I don't know what "etc" means, but if it follows the pattern so far,
sometimes it's faster and sometimes it's slower <wink>.

> (IMO) a big reason for this new type is for interaction with the
> underlying OS/platform. I don't know of any platforms right now that
> really use UTF-8 as their Unicode string representation (meaning we'd
> have to convert back/forth from our UTF-8 representation to talk to the
> OS).

No argument here.

Previous message: [Python-Dev] Internationalization Toolkit
Next message: [Python-Dev] Internationalization Toolkit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4