Tim Peters wrote: >=20 > [M.-A. Lemburg] > > PEP: 0263 (?) > > Title: Defining Unicode Literal Encodings > > Version: $Revision: 1.0 $ > > Author: mal@lemburg.com (Marc-Andr=E9 Lemburg) > > Status: Draft > > Type: Standards Track > > Python-Version: 2.3 > > Created: 06-Jun-2001 > > Post-History: >=20 > Since this depends on PEP 244, it should also have a >=20 > Requires: 244 >=20 > header line. Ok, I'll add that. =20 > > ... > > ... can be set using the "directive" statement proposed in PEP 244. > > > > The syntax for the directives is as follows: > > > > 'directive' WS+ 'unicodeencoding' WS* '=3D' WS* PYTHONSTRINGLITER= AL > > 'directive' WS+ 'rawunicodeencoding' WS* '=3D' WS* PYTHONSTRINGLI= TERAL >=20 > PEP 244 doesn't allow these spellings: at most one atom is allowed aft= er > the directive name, and >=20 > =3D "whatever" >=20 > isn't an atom. Remove the '=3D' and PEP 244 is happy, though. If you = want to > keep the "=3D", PEP 244 has to change. True... would that pose a problem ? =20 [Paul] > I think that there should be a single directive for: >=20 > * unicode strings > * 8-bit strings > * comments >=20 > If a user uses UTF-8 for 8-bit strings and Shift-JIS for Unicode, there > is basically no text editor in the world that is going to do the right > thing. And it isn't possible for a web server to properly associate an > encoding. In general, it isn't a useful configuration. Please don't mix 8-bit strings with Unicode literals: 8-bit strings don't carry any encoding information, so providing encoding information cannot be stored anywhere.=20 Comments, OTOH, are part of the program text, so they have to be ASCII just like the Python source itself. Note that it doesn't make sense to use a non-ASCII superset for the Unicode literal encoding (as you and others have noted). Since all builtin Python encodings are ASCII-supersets, this shouldn't pose much of a problem, though ;-) =20 > Also, no matter what the directive says, I think that \uXXXX should > continue to work. Just as in 8-bit strings, it should be possible to mi= x > and match direct encoded input and backslash-escaped characters. > Sometimes one is convenient (because of your keyboard setup) and > sometimes the other is convenient. This proposal exists only to improve > typing convenience so we should go all the way and allow both. Hmm, good point, but hard to implement. We'd probably need a two phase decoding for this to work: 1. decode the given Unicode literal encoding 2. decode any Unicode escapes in the Unicode string =20 > I strongly think we should restrict the directive to one per file and i= n > fact I would say it should be one of the first two lines. It should be > immediately following the shebang line if there is one. This is to allo= w > text editors to detect it as they detect XML encoding declarations. >=20 > My opinions are influenced by the fact that I've helped implement > Unicode support in an Python/XML editor. XML makes it easy to give the > user a good experience. Python could too if we are careful. I think that allowing one directive per file is the way to go, but I'm not sure about the exact position. Basically, I think it should go "near" the top, but not necessarily before any doc-string in the file. =20 > [Guido] > > Hm, then the directive would syntactically have to *precede* the > > docstring. That currently doesn't work -- the docstring may only be > > preceded by blank lines and comments. Lots of tools for processing > > docstrings already have this built into them. Is it worth breaking > > them so that editors can remain stupid? >=20 > No. Agreed. Note that the PEP doesn't require the directive to be placed before the doc-string. That point is still open. Technically, the compiler will only need to know about the encoding before the first Unicode literal in the source file. --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4