Tim Peters wrote: > > [M.-A. Lemburg] > > Jack had the same question. The simple answer is: we need this > > in order to maintain backward compatibility when we move to > > phase two of the implementation. > > > > Here's the longer one: > > > > ASCII is the standard encoding for Python keywords and identifiers. > > There is no standard source code encoding for string literals. > > But there is: > > Python uses the 7-bit ASCII character set for program text and > string literals. 8-bit characters may be used in string literals > and comments but their interpretation is platform dependent; the > proper way to insert 8-bit characters in string literals is by > using octal or hexadecimal escape sequences. > > The Ref Man has said "7-bit ASCII" for both "program text and string > literals" for a long time. The formal grammar in the Ref Man agrees with > this (including the formal grammar for Unicode literals). It's an > historical accident that the tokenizer happened to use C isalpha() to > "enforce" this for identifiers, and that C isalpha() happened to grow > locale-dependence while Guido was too drunk with power to notice <wink>. It's a fact of life that users don't read reference manuals, but simply write programs and feel good if they happen to work :-) As a result, programs have used string literals in many different encodings for a long time. Changing this situation will take time. The proposal aims at clarifying the situation and to make the transition less painful. > > Unicode literals are interpreted using 'unicode-escape' which > > is an enhanced Latin-1 with escape semantics. > > I'm sure they *do* "act like" Latin-1 on your box, and that identifiers also > act like Latin-1 was in effect on your box. But the Ref Man explicitly says > all that is platform dependent; there's no "backward compatibility" to > preserve here beyond 7-bit ASCII unless you want to preserve that Python > always rely on what C isalpha() says. You tell that to the Russians, Japanese or the Europeans writing Python programs -- it just happens that comments and literals are bound to end up using local encodings. Anyway, with the PEP implemented we'll no longer have to restrict ourselves to 7-bit US-ASCII, so all these problems will go away. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4