Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). > > why not? Because plain old 8-bit strings should work just as before, that is, existing scripts only using 8-bit strings should not break. > why keep on pretending that strings and strings are two > different things? it's an artificial distinction, and it only > causes problems all over the place. Sure. The point is that we can't just drop the old 8-bit strings... not until Py3K at least (and as Fred already said, all standard editors will have native Unicode support by then). So for now we're stuck with Unicode *and* 8-bit strings and have to make the two meet somehow -- which isn't all that easy, since 8-bit strings carry no encoding information. > > Could be that we don't need this pragma discussion at all > > if there is a different, more elegant solution to this... > > here's one way: > > 1. standardize on *unicode* as the internal character set. use > an encoding marker to specify what *external* encoding you're > using for the *entire* source file. output from the tokenizer is > a stream of *unicode* strings. Yep, that would work in Py3K... > 2. if the user tries to store a unicode character larger than 255 > in an 8-bit string, raise an OverflowError. There are no 8-bit strings in Py3K -- only 8-bit data buffers which don't have string methods ;-) > 3. the default encoding is "none" (instead of XML's "utf-8"). in > this case, treat the script as an ascii superset, and store each > string literal as is (character-wise, not byte-wise). Uhm. I think UTF-8 will be the standard for text file formats by then... so why not make it UTF-8 ? > additional notes: > > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. I'd say, leave all this to Py3K. > -- we still need an encoding marker for ascii supersets (how about > <?python encoding="utf-8" version="1.6"?> ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Hmm, the tokenizer doesn't do any string -> object conversion. That's a task done by the parser. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4