Greg Ward wrote: > > On 20 September 2000, M.-A. Lemburg said: > > Would it be possible to write a Python syntax checker that doesn't > > stop processing at the first error it finds but instead tries > > to continue as far as possible (much like make -k) ? > > > > If yes, could the existing Python parser/compiler be reused for > > such a tool ? > > >From what I understand of Python's parser and parser generator, no. > Recovering from errors is indeed highly non-trivial. If you're really > interested, I'd look into Terence Parr's ANTLR -- it's a very fancy > parser generator that's waaay ahead of pgen (or lex/yacc, for that > matter). ANTLR 2.x is highly Java-centric, and AFAIK doesn't yet have a > C backend (grumble) -- just C++ and Java. (Oh wait, the antlr.org web > site says it can generate Sather too -- now there's an important > mainstream language! ;-) Thanks, I'll have a look. > Tech notes: like pgen, ANTLR is LL; it generates a recursive-descent > parser. Unlike pgen, ANTLR is LL(k) -- it can support arbitrary > lookahead, although k>2 can make parser generation expensive (not > parsing itself, just turning your grammar into code), as well as make > your language harder to understand. (I have a theory that pgen's k=1 > limitation has been a brick wall in the way of making Python's syntax > more complex, i.e. it's a *feature*!) > > More importantly, ANTLR has good support for error recovery. My BibTeX > parser has a lot of fun recovering from syntax errors, and (with a > little smoke 'n mirrors magic in the lexing stage) does a pretty good > job of it. But you're right, it's *not* trivial to get this stuff > right. And without support from the parser generator, I suspect you > would be in a world of hurtin'. I was actually thinking of extracting the Python tokenizer and parser from the Python source and tweaking it until it did what I wanted it to do, ie. not generate valid code but produce valid error messages ;-) Now from the feedback I got it seems that this is not the right approach. I'm not even sure whether using a parser at all is the right way... I may have to stick to a fairly general tokenizer and then try to solve the problem in chunks of code (much like what Guido hinted at in his reply), possibly even by doing trial and error using the Python builtin compiler on these chunks. Oh well, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4