On 2008-07-03 19:35, Jeroen Ruigrok van der Werven wrote: > -On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote: >> On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <mal at egenix.com> wrote: >>> Please remember that lone surrogate pair code points are perfectly >>> valid Unicode code points, nevertheless. Just as a lone combining >>> code point is valid on its own. >> That is a big part of these problems. For all practical purposes, a >> surrogate is like a UTF-8 code unit, and must be handled the same way, >> so why the heck do they confuse everybody by saying "oh, it's a code >> point too!"? > > Because surrogate code points are not Unicode scalar values, isolated UTF-16 > code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode > 5.0/5.1, section 3.9) True. They are not valid UTF-16 code units, but a code unit is just a storage byte representation of a Unicode tranformation... """ Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) """ That's not the same thing as a code point which is an assignment of a slot in the Unicode character set... """ Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.) """ Reference: http://www.unicode.org/glossary/ Also see Chapter 3.4 (http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G2212): """ Surrogate code points and noncharacters are considered assigned code points, but not assigned characters. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4