FYI, Fredrik's regexp engine also supports two undocumented match-object attributes that could be used to speed SPARK lexing, and especially when there are many token types (gives a direct index to the matching alternative instead of making you do a linear search for it -- that can add up to a major win). Simple example below. Python-Dev, this has been in there since 2.0 (1.6? unsure). I've been using it happily all along. If Fredrik is agreeable, I'd like to see this documented for 2.1, i.e. made an officially supported part of Python's regexp facilities. -----Original Message----- From: Tim Peters [mailto:tim.one@home.com] Sent: Monday, March 12, 2001 6:37 PM To: python-list@python.org Subject: RE: Help with Regular Expressions [Raymond Hettinger] > Is there an idiom for how to use regular expressions for lexing? > > My attempt below is unsatisfactory because it has to filter the > entire match group dictionary to find-out which token caused > the match. This approach isn't scalable because every token > match will require a loop over all possible token types. > > I've fiddled with this one for hours and can't seem to find a > direct way get a group dictionary that contains only matches. That's because there isn't a direct way; best you can do now is seek to order your alternatives most-likely first (which is a good idea anyway, given the way the engine works). If you peek inside sre.py (2.0 or later), you'll find an undocumented class Scanner that uses the undocumented .lastindex attribute of match objects. Someday I hope this will be the basis for solving exactly the problem you're facing. There's also an undocumented .lastgroup attribute: Python 2.1b1 (#11, Mar 2 2001, 11:23:29) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.6 -- press F1 for help >>> import re >>> pat = re.compile(r"(?P<a>aa)|(?P<b>bb)") >>> m = pat.search("baab") >>> m.lastindex # numeral of group that matched 1 >>> m.lastgroup # name of group that matched 'a' >>> m = pat.search("ababba") >>> m.lastindex 2 >>> m.lastgroup 'b' >>> They're not documented yet because we're not yet sure whether we want to make them permanent parts of the language. So feel free to play, but don't count on them staying around forever. If you like them, drop a note to the effbot saying so. for-more-docs-read-the-source-code-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4