Guido van Rossum <guido@python.org> writes: > Paul, we're both just saying the same thing over and over without > convincing each other. I'll wait till someone who wasn't in this > debate before chimes in. OK, I've never contributed to this discussion, but I have a long history of shipping widely used Python/Tkinter/XML tools (see my homepage). I care _very_ much that heretofore I have been unable to support full XML because of the lack of Unicode support in Python. I've already started playing with 1.6a2 for this reason. I notice one apparent mis-communication between the various contributors: Treating narrow-strings as consisting of UNICODE code points <= 255 is not necessarily the same thing as making Latin-1 the default encoding. I don't think on Paul and Fredrik's account encoding are relevant to narrow-strings at all. I'd rather go right away to the coherent position of byte-arrays, narrow-strings and wide-strings. Encodings are only relevant to conversion between byte-arrays and strings. Decoding a byte-array with a UTF-8 encoding into a narrow string might cause overflow/truncation, just as decoding a byte-array with a UTF-8 encoding into a wide-string might. The fact that decoding a byte-array with a Latin-1 encoding into a narrow-string is a memcopy is just a side-effect of the courtesy of the UNICODE designers wrt the code points between 128 and 255. This is effectively the way our C-based XML toolset (which we embed in Python) works today -- we build an 8-bit version which uses char* strings, and a 16-bit version which uses unsigned short* strings, and convert from/to byte-streams in any supported encoding at the margins. I'd like to keep byte-arrays at the margins in Python as well, for all the reasons advanced by Paul and Fredrik. I think treating existing strings as a sort of pun between narrow-strings and byte-arrays is a recipe for ongoing confusion. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2001, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4