M.-A. Lemburg wrote: > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). why not? why keep on pretending that strings and strings are two different things? it's an artificial distinction, and it only causes problems all over the place. > Could be that we don't need this pragma discussion at all > if there is a different, more elegant solution to this... here's one way: 1. standardize on *unicode* as the internal character set. use an encoding marker to specify what *external* encoding you're using for the *entire* source file. output from the tokenizer is a stream of *unicode* strings. 2. if the user tries to store a unicode character larger than 255 in an 8-bit string, raise an OverflowError. 3. the default encoding is "none" (instead of XML's "utf-8"). in this case, treat the script as an ascii superset, and store each string literal as is (character-wise, not byte-wise). additional notes: -- item (3) is for backwards compatibility only. might be okay to change this in Py3K, but not before that. -- leave the implementation of (1) to 1.7. for now, assume that scripts have the default encoding, which means that (2) cannot happen. -- we still need an encoding marker for ascii supersets (how about <?python encoding=3D"utf-8" version=3D"1.6"?> ;-). however, it's up to the tokenizer to detect that one, not the parser. the parser only sees unicode strings. </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4