> >Guido van Rossum writes: > >My suggested criterion is that 1.6 not screw things up in a way that > >we'll regret when 1.7 rolls around. UTF-8 probably does back us into > >a corner that > Andrew M. Kuchling writes: > Doh! To complete that paragraph: Magic conversions assuming UTF-8 > does back us into a corner that is hard to get out of later. Magic > conversions assuming Latin1 or ASCII are a bit better, but I'd lean > toward the draconian solution: we don't know what we're doing, so do > nothing and require the user to explicitly convert between Unicode and > 8-bit strings in a user-selected encoding. GvR responds: That's what Ping suggested. My reason for proposing default conversions from ASCII is that there is much code that deals with character strings in a fairly abstract sense and that would work out of the box (or after very small changes) with Unicode strings. This code often uses some string literals containing ASCII characters. An arbitrary example: code to reformat a text paragraph; another: an XML parser. These look for certain ASCII characters given as literals in the code (" ", "<" and so on) but the algorithm is essentially independent of what encoding is used for non-ASCII characters. (I realize that the text reformatting example doesn't work for all Unicode characters because its assumption that all characters have equal width is broken -- but at the very least it should work with Latin-1 or Greek or Cyrillic stored in Unicode strings.) It's the same as for ints: a function to calculate the GCD works with ints as well as long ints without change, even though it references the int constant 0. In other words, we want string-processing code to be just as polymorphic as int-processing code. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4