> perfectionist or not, I only want Python's Unicode support to > be as intuitive as anything else in Python. as it stands right > now, Perl and Tcl's Unicode support is intuitive. Python's not. I haven't much experience with Perl, but I don't think Tcl is intuitive in this area. I really think that they got it all wrong. They use the string type for "plain bytes", just as we do, but then have the notion of "correct" and "incorrect" UTF-8 (i.e. strings with violations of the encoding rule). For a "plain bytes" string, the following might happen - the string is scanned for non-UTF-8 characters - if any are found, the string is converted into UTF-8, essentially treating the original string as Latin-1. - it then continues to use the UTF-8 "version" of the original string, and converts it back on demand. Maybe I got something wrong, but the Unicode support in Tcl makes me worry very much. > btw, I thought we'd all agreed on GvR's solution for 1.6? > > what did I miss? I like the 'only ASCII is converted' approach very much, so I'm not objecting to that solution - just as I wasn't objecting to the previous one. > so tell me, if "good enough" is what we're aiming at, why isn't > my counter-proposal good enough? Do you mean the one in http://www.python.org/pipermail/python-dev/2000-April/005218.html which I suppose is the same one as the "java-like approach"? AFAICT, all it does is to change the default encoding from UTF-8 to Latin-1. I can't follow why this should be *better*, but it would be certainly as good... In comparison, restricting the "character" interpretation of the string type (in terms of your proposal) to 7-bit characters has the advantage that it is less error-prone, as Guido points out. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4