On 12/09/2014 04:28, Stephen J. Turnbull wrote: > Jeff Allen writes: > > > A welcome article. One correction should be made, I believe: the area of > > code point space used for the smuggling of bytes under PEP-383 is not a > > "Unicode Private Use Area", but a portion of the trailing surrogate > > range. > > Nice catch. Note that the surrogate range was originally part of the > Private Use Area, but it was carved out with the adoption of UTF-16 in > about 1993. In practice, I doubt that there are any current > implementations claiming compatibility with Unicode 1.0 (IIRC, UTF-16 > was made mandatory in Unicode 1.1). That's a helpful bit of history that explains the uncharacteristic inaccuracy. Most I can do to keep the current position clear in my head. > I've always thought that the "right" way to handle the private use > area for "platforms" like Python and Emacs, which may need to use it > for their own purposes (such as "undecodable bytes") but want to > respect its use by applications, is to create an auxiliary table > mapping the private use area to objects describing the characters > represented by the private use code points. These objects would have > attributes such as external representation for text I/O, glyph (for > GUI display), repr (for TTY display), various Unicode properties, etc. Simply having a block "for private use" seems to create an unmanaged space for conflict, reminiscent of the "other 128 characters" in bilingual programming. I wondered if the way to respect use by applications might be to make it private to a particular sub-class of str, idly however. Jeff Allen
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4