Bugs item #681960, was opened at 2003-02-06 22:17 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=681960&group_id=5470 Category: Unicode Group: Python 2.3 Status: Closed Resolution: None Priority: 3 Submitted By: Kirill Simonov (kirill_simonov) Assigned to: M.-A. Lemburg (lemburg) Summary: Source encoding rules are extreme. Initial Comment: According to the PEP 0263, a source code that contains non-ASCII characters (ord(ch)>127) and does not define an encoding causes DeprecationWarning. In the future, such code will cause SyntaxError. While I believe that the idea of defining source code encoding is very useful, I think that the current solution is unnecessary extreme. It is very unfriendly for beginners. Imagine a student that types her first script: name = raw_input("What's your name? ") # russian here, of course print "Hi %s!" % name Do not even try to convince me that she must define an encoding here. That feature would break any possibility to use Python in schools. Actually the source code encoding only affects Unicode literals. The above script works the same way with any defined encoding, so the warning for this code is unnecessary. As a solution, I propose to issue DeprecationWarning (or SyntaxError) only when a non-ASCII character is contained in a Unicode literal. ---------------------------------------------------------------------- >Comment By: Kirill Simonov (kirill_simonov) Date: 2003-02-10 15:39 Message: Logged In: YES user_id=36553 I like this. Thanks. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-10 09:43 Message: Logged In: YES user_id=38388 I've had a private discussion with Guido and Roman Suzi: We'll add a way to set the source code default encoding via the site.py/sitecustomize.py files. This should then allow anyone wishing to customize the default behaviour to do so. ---------------------------------------------------------------------- Comment By: Kirill Simonov (kirill_simonov) Date: 2003-02-06 23:28 Message: Logged In: YES user_id=36553 Hello, Yes, I understand that the encoding is for the whole source file. But 1. The current implementation already assumes that one uses ASCII- compatible encoding. So we can make a step further and do not use any encoding while reading a source file. And then we'll translate u"..." using 'ascii' encoding. 2. How do you want to support UTF-16 encoding? This will completely break ordinary string literals! "aa" is a source code would become "a\x00a\x00" after compilation. Or do I miss something? 3. Do not forget that your change breaks billions of scripts that use non-ASCII characters even in comments! 4. I can write a patch. I would be forced to do this anyway. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2003-02-06 22:45 Message: Logged In: YES user_id=38388 Sorry, but the implementation we chose decodes the complete file, not only the Unicode literals, so if you want to use a specific encoding in the source code, you have to be explicit about it. Python's source code was originally never meant to contain non-ASCII characters. The PEP implementation now officially allows this provided that you use an encoding marker, e.g. """ # -*- coding: windows-1251 -*- name = raw_input(" ? ") print " %s" % name """ Note that this is also needed in order to support UTF-16 file formats which use two bytes per character. Python will automatically detect these files, so if you really don't like the coding marker, simply write the file using a UTF-16 aware editor which prepends a UTF-16 BOM mark to the file. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=681960&group_id=5470
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4