Blast from the past! [/F] > for phrase, action in lexicon: > p.append("(?:%s)(?P#%d)" % (phrase, len(p))) [Tim] > How about instead enhancing existing (?P<name>pattern) notation, to > set a new match object attribute to name if & when pattern matches? > Then arbitrary info associated with a named pattern can be gotten at > via dicts via the pattern name, & the whole mess should be more > readable. [/F Sent: Sunday, July 02, 2000 6:35 PM] > I just added "lastindex" and "lastgroup" attributes to the match object. > > "lastindex" is the integer index of the last matched capturing group, > "lastgroup" the corresponding name (or None, if the group didn't have > a name). both attributes are None if no group were matched. Reviewing this before 2.0 has been on my todo list for 3+ months, and finally got to it. Good show! I converted some of my by-hand scanners to use lastgroup, and like it a whole lot. I know you understand why this is Good, so here's a simple example of an "after" tokenizer for those who don't (this one happens to tokenize REXX-like PARSE stmts): import re _token = re.compile(r""" (?P<space> \s+) | (?P<var> [a-zA-Z_]\w*) | (?P<dontcare> \.) | (?P<number> \d+) | (?P<punc> [-+=()]) | (?P<string> " [^"\\\n]* (?: \\. [^"\\\n]*)* " | ' [^'\\\n]* (?: \\. [^'\\\n]*)* ' ) """, re.VERBOSE).match del re (T_SPACE, T_VAR, T_DONTCARE, T_NUMBER, T_PUNC, T_STRING, T_EOF, ) = range(7) # For debug output. _enum2name = ["T_SPACE", "T_VAR", "T_DONTCARE", "T_NUMBER", "T_PUNC", "T_STRING", "T_EOF", ] _group2action = { "space": (T_SPACE, None), "var": (T_VAR, None), "dontcare": (T_DONTCARE, None), "number": (T_NUMBER, int), "punc": (T_PUNC, None), "string": (T_STRING, eval), } def tokenize(s, tokeneater): i, n = 0, len(s) while i < n: m = _token(s, i) if not m: raise ParseError(s, i) group = m.lastgroup enum, action = _group2action[group] val = m.group(group) if action is not None: val = action(val) tokeneater(enum, val) i = m.end() tokeneater(T_EOF, None) The tokenize function here used to be a mass of if/elif stmts trying to figure out which group had matched. Now it's all table-driven: easier to write, reuse & maintain, and quicker to boot. +1. the-aged-may-be-slow-but-they-never-forget<wink>-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4