Fredrik Lundh writes: > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Watching the successive weekly Unicode patchsets, each one fixing some obscure corner case that turned out to be buggy -- '%s' % ustr, concatenating literals, int()/float()/long(), comparisons -- I'm beginning to agree with Fredrik. Automatically making Unicode strings and regular strings interoperate looks like it requires many changes all over the place, and I worry if it's possible to catch them all in time. Maybe we should consider being more conservative, and just having the Unicode built-in type, the unicode() built-in function, and the u"..." notation, and then leaving all responsibility for conversions up to the user. On the other hand, *some* default conversion seems needed, because it seems draconian to make open(u"abcfile") fail with a TypeError. (While I want to see Python 1.6 expedited, I'd also not like to see it saddled with a system that proves to have been a mistake, or one that's a maintenance burden. If forced to choose between delaying and getting it right, the latter wins.) >why not just assume that the *ENTIRE SOURCE FILE* uses a single >encoding, and let the tokenizer (or more likely, a conversion stage >before the tokenizer) convert the whole thing to unicode. To reinforce Fredrik's point here, note that XML only supports encodings at the level of an entire file (or external entity). You can't tell an XML parser that a file is in UTF-8, except for this one element whose contents are in Latin1. -- A.M. Kuchling http://starship.python.net/crew/amk/ Dream casts a human shadow, when it occurs to him to do so. -- From SANDMAN: "Season of Mists", episode 0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4