On donderdag, april 25, 2002, at 08:59 , Guido van Rossum wrote: >> I don't know why it is, but Unicode always seems to unnecessarily >> heat up any discussion involving it. I would really like to know >> what is causing this: is it a religious issue, does it have to do >> with the people involved or is Unicode inherently controversial ? > [...] > Another issue is that adding Unicode was probably the most invasive > set of changes ever made to the Python code base. It has complicated > many parts of the code, and added at least a proportional share of > bugs. (I found 166 source files in CVS containing some variation on > the string "unicode", and 110 bug reports mentioning "unicode" in the > SF bug tracker.) Another thing that bothers me is that it retroactively changed the interpretation of other Python objects. For me it's perfectly logical that a character string is a character string, unless there's a very good reason to treat it differently (a framebuffer scanline, a binary blob, etc). And so if I have an API OpenFileWithUnicodeName() that accepts a unicode filename I expect that if I pass an 8-bit filename it would be converted on the fly. Other people focus on different sets of API's, however, and think there's nothing more logical than interpreting the string object as a binary buffer containing UTF16 values or what-have-you. Scanlines or binary blobs hardly ever mixed with filenames, so there wasn't an issue before unicode raised its pretty/ugly head. (of course it could be argued that unicode has demonstrated a design flaw in Python, namely that a single data-type was used to store both binary data of unknown interpretation and character arrays, and that there's now little more to be done about that). -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4