RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-bugs-list/2003-February/016064.html below:

[Python-bugs-list] [ python-Bugs-681960 ] Source encoding rules are extreme.

[Python-bugs-list] [ python-Bugs-681960 ] Source encoding rules are extreme.SourceForge.net noreply@sourceforge.net
Mon, 10 Feb 2003 07:39:48 -0800

Previous message: [Python-bugs-list] [ python-Bugs-665835 ] filter() treatment of str and tuple inconsistent
Next message: [Python-bugs-list] [ python-Bugs-684022 ] extended slice strangeness
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Bugs item #681960, was opened at 2003-02-06 22:17
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=681960&group_id=5470

Category: Unicode
Group: Python 2.3
Status: Closed
Resolution: None
Priority: 3
Submitted By: Kirill Simonov (kirill_simonov)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Source encoding rules are extreme.

Initial Comment:
According to the PEP 0263, a source code that contains
non-ASCII
characters (ord(ch)>127) and does not define an
encoding causes
DeprecationWarning. In the future, such code will cause
SyntaxError.

While I believe that the idea of defining source code
encoding is very
useful, I think that the current solution is
unnecessary extreme.

It is very unfriendly for beginners. Imagine a student that
types her first script:

name = raw_input("What's your name? ")   # russian
here, of course
print "Hi %s!" % name

Do not even try to convince me that she must define an
encoding
here. That feature would break any possibility to use
Python in schools.

Actually the source code encoding only affects Unicode
literals.
The above script works the same way with any defined
encoding,
so the warning for this code is unnecessary.

As a solution, I propose to issue DeprecationWarning
(or SyntaxError)
only when a non-ASCII character is contained in a
Unicode literal.


----------------------------------------------------------------------

>Comment By: Kirill Simonov (kirill_simonov)
Date: 2003-02-10 15:39

Message:
Logged In: YES 
user_id=36553

I like this. Thanks.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-02-10 09:43

Message:
Logged In: YES 
user_id=38388

I've had a private discussion with Guido and Roman Suzi:

We'll add a way to set the source code default encoding via the
site.py/sitecustomize.py files. This should then allow anyone
wishing to customize the default behaviour to do so.

----------------------------------------------------------------------

Comment By: Kirill Simonov (kirill_simonov)
Date: 2003-02-06 23:28

Message:
Logged In: YES 
user_id=36553

Hello,

Yes, I understand that the encoding is for the whole source
file.

But

1. The current implementation already assumes that one uses
ASCII-
compatible encoding. So we can make a step further and do
not use any
encoding while reading a source file. And then we'll
translate u"..." using
'ascii' encoding.

2. How do you want to support UTF-16 encoding? This will
completely
break ordinary string literals! "aa" is a source code would
become "a\x00a\x00" after compilation. Or do I miss something?

3. Do not forget that your change breaks billions of scripts
that use
non-ASCII characters even in comments!

4. I can write a patch. I would be forced to do this anyway.



----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-02-06 22:45

Message:
Logged In: YES 
user_id=38388

Sorry, but the implementation we chose decodes the complete
file,
not only the Unicode literals, so if you want to use a specific 
encoding in the source code, you have to be explicit about it.

Python's source code was originally never meant to contain
non-ASCII characters. The PEP implementation now officially
allows this provided that you use an encoding marker, e.g.

"""
# -*- coding: windows-1251 -*-
name = raw_input("   ? ")
print " %s" % name
"""

Note that this is also needed in order to support UTF-16
file formats which use two bytes per character. Python
will automatically detect these files, so if you really don't
like the coding marker, simply write the file using a UTF-16
aware editor which prepends a UTF-16 BOM mark to the
file.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=681960&group_id=5470

Previous message: [Python-bugs-list] [ python-Bugs-665835 ] filter() treatment of str and tuple inconsistent
Next message: [Python-bugs-list] [ python-Bugs-684022 ] extended slice strangeness
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4