RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-July/006921.html below:

[Python-Dev] uPEP: encoding directive

[Python-Dev] uPEP: encoding directiveFredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 19 Jul 2000 00:52:11 +0200

Previous message: [Python-Dev] decoding errors when comparing strings
Next message: [Python-Dev] uPEP: encoding directive
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> [paul]
> > Also, is it really necessary to allow raw non-ASCII characters in =
source
> > code though? We know that they aren't portable across editing
> > environments, so one person's happy face will be another person's =
left
> > double-dagger.
>
> [me]
> I suppose changing that would break code.  maybe it's time
> to reopen the "pragma encoding" thread?
>=20
> (I'll dig up my old proposal, and post it under a new subject).

as brief as I can make it:

1. add support for "compiler directives".  I suggest the following
syntax, loosely based on XML:

    #?python key=3Dvalue [, key=3Dvalue ...]

(note that "#?python" will be treated as a token after this change.
if someone happens to use comments that start with #?python,
they'll get a "SyntaxError: bad #?python compiler directive"...)

2. for now, only accept compiler directives if they appear before
the first "real" statement.

3. keys are python identifiers (NAME tokens), values are simple
literals (STRING, NUMBER)

4. key/value pairs are collected in a dictionary.

5. for now, we only support the "encoding" key.  it is used to
determine how string literals (STRING tokens) are converted
to string or unicode string objects.

6. the encoding value can be any of:

"undefined" or not defined at all:

    plain string: copy source characters as is

    unicode string: expand 8-bit source characters to
    unicode characters (i.e. treat them as ISO Latin 1)

"ascii"

    plain string: characters in the 128-255 range gives
    a SyntaxError (illegal character in string literal).

    unicode string: same as for plain string

any other ascii-compatible encoding (the ISO 8859 series,
Mac Roman, UTF-8, and others):

    plain string: characters in the 128-255 range gives
    a SyntaxError (illegal character in string literal).

    unicode string: characters in the 128-255 range are
    decoded, according to the given encoding.
    string has been decoded,=20

any other encoding (UCS-2, UTF-16)

    undefined (or SyntaxError: illegal encoding)

to be able to flag this as a SyntaxError, I assume we can
add an "ASCII compatible" flag to the encoding files.

7. only the contents of string literals can be encoded.  the
tokenizer still works on 7-bit ASCII (hopefully, this will change
in future versions).

8. encoded string literals are decoded before Python looks
for backslash escape codes.

I think that's all.

Comments?  I've looked at the current implementation rather
carefully, and it shouldn't be that hard to come up with patches
that implement this scheme.

</F>

Previous message: [Python-Dev] decoding errors when comparing strings
Next message: [Python-Dev] uPEP: encoding directive
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4