A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2002-September/028720.html below:

[Python-Dev] utf-8 issue thread question

[Python-Dev] utf-8 issue thread questionBrett Cannon drifty@bigfoot.com
Tue, 10 Sep 2002 17:07:58 -0700 (PDT)
So here is the summary question for this thread: what exactly is a
surrogate?  I think I get it (from reading a l18n email from MAL on the
l18n list), but I am not confident enough to stick in the summary as of
yet.

The following is my current rough summary explanation for what a surrogate
is.  Can someone please correct it as needed?

"""
In Unicode, a surrogate is when you encode from a higher bit total
encoding (such as utf-16) into a smaller bit total encoding by
representing the character as several more bit chunks (such as two utf-8
chunks).  The following line is an example:

	>>> u'\ud800'.encode('utf-8') == '\xed\xa0\x80'

Notice how the initial Unicode character ends up being encoded as three
characters in utf-8.
"""

Also, anyone know of some good Unicode tutorials, explanations, etc. on
the web, in book form, whatever?  Most of the threads that I don't totally
comprehend are Unicode related and I would like to minimize my brain-dead
questions to a minimum.  Don't want my reputation to go down the drain.
=)

-Brett




RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4