A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2008-July/080957.html below:

[Python-Dev] UCS2/UCS4 default

[Python-Dev] UCS2/UCS4 defaultGuido van Rossum guido at python.org
Fri Jul 4 05:26:16 CEST 2008
On Thu, Jul 3, 2008 at 4:50 PM, Adam Olsen <rhamph at gmail.com> wrote:
> Clearly, each surrogate is a valid code point, regardless of encoding.
>  A surrogate pair simultaneously represents both one code point (the
> scalar value) and two code points (the surrogate code points).  To be
> unambiguous you must instead use either code units (always 2 for
> UTF-16) or scalar values (always 1 in any encoding).
>
> The OP wanted it to always be 1, so the correct unambiguous term is
> scalar value.

Fine, if you want to be completely unambiguous you apparently you
can't use the word code point but you have to use either scalar values
(always Unicode characters) or code units (always part of an encoding,
and 8, 16 or 32 bits).

Regardless of what the OP might want, len() of a surrogate pair will
return 2 (since it counts code units), and we'll have to provide
another API to count scalar values / characters that sees a surrogate
pair as one.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4