A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2005-October/057599.html below:

[Python-Dev] Divorcing str and unicode (no more implicit conversions).

[Python-Dev] Divorcing str and unicode (no more implicit conversions). [Python-Dev] Divorcing str and unicode (no more implicit conversions)."Martin v. Löwis" martin at v.loewis.de
Mon Oct 24 23:06:38 CEST 2005
M.-A. Lemburg wrote:
> There seems to be a general misunderstanding here: even if you
> have UCS4 storage, it is still possible to slice a Unicode
> string in a way which makes rendering it correctly.
                                                       [impossible?]

> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.

While this is all true, I agree with Neil that it should do
whatever it does consistently across implementations, i.e.
len("\U00010000") should always give the same result, and
I think this result should always be 1.

How to best implement this efficiently is an entirely different
question, as is the question whether you can render
arbitrary substrings in a meaningful way.

Regards,
Martin
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4