Gary Herron wrote: > On Wednesday 26 February 2003 10:23 am, M.-A. Lemburg wrote: >>>>>The first glance at the regular expression bug list and the _sre.c >>>>>code results in the observation that several of the bugs are related >>>>>to running over the recursion limit. The problem comes from using a >>>>>pattern containing ".*?" in a situation where it is expected to match >>>>>many thousands of characters. Each character matched by ".*?" causes >>>>>one level or recursion, quickly overflowing the recursion limit. >>>> >>>>Wouldn't it be possible for the RE compiler to issue a warning in >>>>case these kind of patterns are used ? This would be much more helpful >>>>than trying to work-around the user problem. >>> >>>I think not. It's not the pattern that's the problem. A pattern >>>containing ".*?" is perfectly legitimate and useful. >> >>Hmm, could you explain where ".*?" is useful ? > > Yes, easily. It's the non-greedy version of "match all". The manual > page for the re module has this nice example: > > *?, +?, ?? > The "*", "+", and "?" qualifiers are all greedy; they match as much > text as possible. Sometimes this behaviour isn't desired; if the RE > <.*> is matched against '<H1>title</H1>', it will match the entire > string, and not just '<H1>'. Adding "?" after the qualifier makes it > perform the match in non-greedy or minimal fashion; as few > characters as possible will be matched. Using .*? in the previous > expression will match only '<H1>'. Ah, ok. I usually write "<[^>]+>" for these things, if at all... I tend to use mxTextTools for parsing :-) >>>The problem >>>arises when the pattern is used on a string which has thousands of >>>characters which match. By that point the RE compiler is right out of >>>the picture. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Feb 26 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ Python UK 2003, Oxford: 34 days left EuroPython 2003, Charleroi, Belgium: 118 days left
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4