A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2005-May/053508.html below:

[Python-Dev] New Py_UNICODE doc

[Python-Dev] New Py_UNICODE docShane Hathaway shane at hathawaymix.org
Sat May 7 10:05:42 CEST 2005
Martin v. Löwis wrote:
> Define correctly. Python, in ucs2 mode, will allow to address individual
> surrogate codes, e.g. in indexing. So you get
> 
> 
>>>>u"\U00012345"[0]

When Python encodes characters internally in UCS-2, I would expect
u"\U00012345" to produce a UnicodeError("character can not be encoded in
UCS-2").

> u'\ud808'
> 
> This will never work "correctly", and never should, because an efficient
> implementation isn't possible. If you want "safe" indexing and slicing,
> you need ucs4.

I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
widely used and takes less space, while UCS4 is easier to treat as an
array of characters.  Maybe we can have both: unicode objects start with
an internal representation in UTF-16, but get promoted automatically to
UCS4 when you index or slice them.  The difference will not be visible
to Python code.  A compile-time switch will not be necessary.  What do
you think?

Shane
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4