"M.-A. Lemburg" wrote: > >... > > Character > > > > Used by itself, means the addressable units of a Python > > Unicode string. > > Please add: also known as "code unit". I'm not entirely comfortable with that. As you yourself pointed out, the same Python Unicode object can be interpreted as either a series of single-width code points *or* as a UTF-16 string where the characters are code units. You could also interpet it as a BASE64'd region or an XML document... It all depends on how you look at it. > .... > > Surrogate pair > > > > Two physical characters that represent a single logical > > Eeek... two code units (or have you ever seen a physical character > walking around ;-) No, that's sort of my point. The user can decide to adopt the convention of looking at the two characters as code units or they can ignore that interpretation and look at them as two code points. It's all relative, man. Dig it? That's why I use the word "convention" below: > > character. Part of a convention for representing 32-bit > > code points in terms of two 16-bit code points. "Surrogates are all in your head. Python doesn't know or care about them!" I'll change this to: Surrogate pair Two Python Unicode characters that represent a single logical Unicode code point. Part of a convention for representing 32-bit code points in terms of two 16-bit code points. Python has limited support for reading, writing and constructing strings that use this convention (described below). Otherwise Python ignores the convention. > No need to pass this information to the codec: simply write > a new one and give it a clear name, e.g. "ucs-2" will generate > errors while "utf-16-le" converts them to surrogates. That's a good point, but what if I want a UTF-8 codec that doesn't generate surrogates? Or even a UCS4 one? > Plus perhaps the Mark Davis paper at: > > http://www-106.ibm.com/developerworks/unicode/library/utfencodingforms/ Okay. > > Copyright > > > > This document has been placed in the public domain. > > Good work, Paul ! Thanks for your help. You did help me to clarify many things even though I argued with you as I was doing it. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4