A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-June/101117.html below:

[Python-Dev] thoughts on the bytes/string discussion

[Python-Dev] thoughts on the bytes/string discussionStephen J. Turnbull stephen at xemacs.org
Sat Jun 26 19:24:50 CEST 2010
Greg Ewing writes:

 > Would there be any sanity in having an option to compile
 > Python with UTF-8 as the internal string representation?

Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the
beginning of the pain.

If Emacs's experience is any guide, the cost in speed and complexity
of a variable-width internal representation is high.  There are a
number of tricks you can use, but basically everything becomes O(n)
for the natural implementation of most operations (such as indexing by
character).  You can get around that with a position cache, of course,
but that adds complexity, and really cuts into the space saving (and
worse, adds another chunk that may or may not be paged in when you
need it).

What we're considering is a system where buffers come in 1-, 2-, and
4-octet widechars, with automatic translation depending on content.
But the buffer is the primary random-access structure in Emacsen, so
optimizing it is probably worth our effort.  I doubt it would be worth
it for Python, but my intuitions here are not reliable.
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4