On Wednesday 26 February 2003 10:23 am, M.-A. Lemburg wrote: > Gary Herron wrote: > > On Wednesday 26 February 2003 01:08 am, M.-A. Lemburg wrote: > >>Gary Herron wrote: > >>>The first glance at the regular expression bug list and the _sre.c > >>>code results in the observation that several of the bugs are related > >>>to running over the recursion limit. The problem comes from using a > >>>pattern containing ".*?" in a situation where it is expected to match > >>>many thousands of characters. Each character matched by ".*?" causes > >>>one level or recursion, quickly overflowing the recursion limit. > >> > >>Wouldn't it be possible for the RE compiler to issue a warning in > >>case these kind of patterns are used ? This would be much more helpful > >>than trying to work-around the user problem. > > > > I think not. It's not the pattern that's the problem. A pattern > > containing ".*?" is perfectly legitimate and useful. > > Hmm, could you explain where ".*?" is useful ? Yes, easily. It's the non-greedy version of "match all". The manual page for the re module has this nice example: *?, +?, ?? The "*", "+", and "?" qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'. > > The problem > > arises when the pattern is used on a string which has thousands of > > characters which match. By that point the RE compiler is right out of > > the picture.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4