A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2002-September/028721.html below:

[Python-Dev] utf-8 issue thread question

[Python-Dev] utf-8 issue thread questionFredrik Lundh fredrik@pythonware.com
Wed, 11 Sep 2002 02:24:53 +0200
Brett Cannon wrote:

> The following is my current rough summary explanation for what a =
surrogate
> is.  Can someone please correct it as needed?

needed, indeed.

it's 2.30 am over here, so I'm not going to try to explain this myself,
but some random googling brought up this page:

http://216.239.37.100/search?q=3Dcache:Dk12BZNt6skC:uk.geocities.com/Babe=
lStone1357/Software/surrogates.html

    The code points U+D800 through U+DB7F are reserved as High =
Surrogates,
    and the code points U+DC00 through U+DFFF are reserved as Low =
Surrogates.
    Each code point in [the full 20-bit unicode character space] maps to =
a pair of
    16-bit code points comprising a High Surrogate followed by a Low =
Surrogate.
    Thus, for example, the Gothic letter AHSA has the UTF-32 value of =
U+10330,
    which maps to the surrogate pair U+D800 and U+DF30. That is to say, =
in the
    16-bit encoding of Unicode (UTF-16), the Gothic letter AHSA is =
represented
    by two consecutive 16-bit code points (U+D800 and U+DF30), whereas =
in the
    32-bit encoding of Unicode (UTF-32), the same letter is represented =
by a
    single 32-bit value (U+10330).

</F>





RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4