On Tue, Nov 17, 2015 at 1:59 AM, M.-A. Lemburg <mal at egenix.com> wrote: > On 17.11.2015 02:53, Serhiy Storchaka wrote: >> I'm working on rewriting Python tokenizer (in particular the part that reads and decodes Python >> source file). The code is complicated. For now there are such cases: >> >> * Reading from the string in memory. >> * Interactive reading from the file. >> * Reading from the file: >> - Raw reading ignoring encoding in parser generator. >> - Raw reading UTF-8 encoded file. >> - Reading and recoding to UTF-8. >> >> The file is read by the line. It makes hard to check correctness of the first line if the encoding >> is specified in the second line. And it makes very hard problems with null bytes and with >> desynchronizing buffered C and Python files. All this problems can be easily solved if read all >> Python source file in memory and then parse it as string. This would allow to drop a large complex >> and buggy part of code. >> >> Are there disadvantages in this solution? As for memory consumption, the source text itself will >> consume only small part of the memory consumed by AST tree and other structures. As for performance, >> reading and decoding all file can be faster then by the line. > > A problem with this approach is that you can no > longer fail early and detect indentation errors et al. while > parsing the data (which may well come from a pipe). Oh, this use case I had forgotten about. I don't know how common or important it is though. But more important is the interactive REPL, which parses your input fully each time you hit ENTER. > Another related problem is that you have to wait for the full > input data before you can start compiling the code. That's always the case -- we don't start compiling before we have the full parse tree. > I don't think these situations are all that common, though, > so reading in the full source code before compiling it > sounds like a reasonable approach. > > We use the same simplification in eGenix PyRun's emulation of > the Python command line interface and it has so far not > caused any problems. Curious how you do it? I'd actually be quite disappointed if the amount of parsing done by the standard REPL went down. >> [1] http://bugs.python.org/issue25643 -- --Guido van Rossum (python.org/~guido)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4