> On 07 July 2000, Thomas Wouters said: > > Argh, my head hurts ! :P Greg Ward replies: > Yeah, but try writing your *own* parser-generator. >grin< > > > Here's what I did. The current definition of 'atom' is: > > > > atom: '(' [testlist] ')' | '[' [testlist] ']' | << ... >> > > testlist: test (',' test)* [','] > > > > I tried adding it like this: > > > > atom: '(' [testlist] ')' | '[' [listmaker] ']' | << ... >> > > testlist: test (',' test)* [','] > > listmaker: testlist | rangemaker > > rangemaker: [test] ':' test > > I think the problem is this: > * one possible candidate for atom is: '[' listmaker > * two candidates for listmaker: testlist or rangemaker > * but both of these start with a test; the fact that one of them > is optional is probably beside the point > * the parser needs to see ',' or ':' to know if it's got a > testlist or a rangemaker: but because a test could be any number > of tokens, it would need infinite lookahead. (I think.) Since > this is a recursive-descent parser, it is most likely LL(1) -- > at most one token of lookahead. Not good enough. > > Another, entirely hypothetical way of looking at it (based on experience > with a completely different recursive descent parser generator > (PCCTS/ANTLR) and dim recollections of the Dragon Book): > * parser sees a '[' token and enters the 'atom' state > (ie. calls a subroutine 'atom()') > * swallows the '[' and immediately enters 'listmaker' (no other option) > * now it wants to enter either 'testlist' or 'rangemaker': but the > FIRST(testlist) == FIRST(test) (testlist can only start with a > test), and FIRST(rangemaker) == FIRST(test) + ':' (rangemaker > can start with a test or the ':' token) > * I suspect this is the ambiguity: overlap in the FIRST sets of > testlist and rangemaker Greg's got it right. It's a recursive descent a.k.a. LL(1) parser. The solution? Try to rewrite the grammar to avoid the ambiguities. So we have [testlist | rangemaker] for listmaker, where testlist is test(','test)* and rangemaker is [test]':'[test]. (Note that you'll have to add [':'[test]] to support [lo:hi:step], but tht's not the problem here.) Rewrite this as listmaker: [rangetail | test((','test)* | rangetail)] rangetail: ':' [test] [':' [test]] Now the two alternatives have different FIRST sets. The disadvantage is that you have to change the code generator to work withthis mess -- that's why people like LALR(1) and other parsers. But LL(1) was all I could create myself (Python uses a homegrown parser generator). --Guido van Rossum (home page: http://dinsdale.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4