On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = 'éééé' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from imp import find_module >>>> find_module('func_unknown_encoding', None) > Traceback (most recent call last): > File "<stdin>", line 1, in<module> > SyntaxError: encoding problem: with BOM >>>> find_module('func_wrong_encoding', None) > (<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', > ('.py', 'U', 1)) >>>> find_module('func_nonascii_noencoding', None) > (<_io.TextIOWrapper name=6 encoding='utf-8'>, > 'func_nonascii_noencoding.py', ('.py', 'U', 1)) > > > So what is the reason of this selective behavior? > Furthermore, there is BOM in our func_unknown_encoding.py module. I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. For example, the problem that brought this to my attention in python3.2. >>> find_module('test/badsyntax_pep3120') Segmentation fault >>> from test import badsyntax_pep3120 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xf6' in file /usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details The import statement uses parser.c, and tokenizer.c indirectly, to import a file, but the imp module uses tokenizer.c directly. They aren't consistent in how they handle errors because the different error messages are generated in different places depending on what the error is, *and* what the code path to get to that point was, *and* weather or not a filename was set. For the example above with imp.findmodule(), the filename isn't set, so you get a different error than if you used import, which uses the parser module and that does set the filename. From what I've seen, it would help if the imp module was rewritten to use parser.c like the import statement does, rather than tokenizer.c directly. The error handling in parser.c is much better than tokenizer.c. Possibly tokenizer.c could be cleaned up after that and be made much simpler. Ron Adam
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4