On Mon, 1 May 2000, Guido van Rossum wrote: > Paul, we're both just saying the same thing over and over without > convincing each other. I'll wait till someone who wasn't in this > debate before chimes in. Well, I'm guessing you had someone specific in mind (Neil?), but I want to say someothing too, as the only one here (I think) using ISO-8859-8 natively. I much prefer the Fredrik-Paul position, known also as the character is a character position, to the UTF-8 as default encoding. Unicode is western-centered -- the first 256 characters are Latin 1. UTF-8 is even more horribly western-centered (or I should say USA centered) -- ASCII documents are the same. I'd much prefer Python to reflect a fundamental truth about Unicode, which at least makes sure binary-goop can pass through Unicode and remain unharmed, then to reflect a nasty problem with UTF-8 (not everything is legal). If I'm using Hebrew characters in my source (which I won't for a long while), I'll use them in Unicode strings only, and make sure I use Unicode. If I'm reading Hebrew from an IS-8859-8 file, I'll set a conversion to Unicode on the fly anyway, since most bidi libraries work on Unicode. So having UTF-8 conversions magically happen won't help me at all, and will only cause problem when I use "sort-for-uniqueness" on a list with mixed binary-goop and Unicode strings. In short, this sounds like a recipe for disaster. internationally y'rs, Z. -- Moshe Zadka <moshez@math.huji.ac.il> http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4