Bugs item #493252, was opened at 2001-12-14 02:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=493252&group_id=5470 Category: Regular Expressions Group: None Status: Open Resolution: None Priority: 5 Submitted By: P. de Jong (peterdejong) Assigned to: Fredrik Lundh (effbot) Summary: maximum recursion limit exceeded in match Initial Comment: RuntimeError: maximum recursion limit exceeded in match, while trying to match a string of 16384 bytes. (Python 2.0) The error does not occur after eliminating some 100 characters from the string. The error does not occur in Python 1.5.2. So I cannot upgrade. Peter de Jong ---------------------------------------------------------------------- >Comment By: P. de Jong (peterdejong) Date: 2001-12-17 01:02 Message: Logged In: YES user_id=402001 Tim and Guido, I want to match things like: "jlkjkl ""kjlklkjkl""ljk ;lkk;l"; or: "jlkjkl ""kjlklkjkl""ljk ;lkk;l" Maybe I can trust the embedded quotes to be doubled, but I was not sure at the moment of writing the expression. I do have to include newlines inside the enclosing quotes. Peter de Jong ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-12-15 20:34 Message: Logged In: YES user_id=31435 Oops! I used to make the same mistake in Perl too (which is I pushed to name the symbolic flag DOTALL instead of SINGLELINE). So the correct regexp is r'"[^"]*"[;\n]' Assuming, of course, that Peter does want embedded newlines to get sucked up. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-15 17:30 Message: Logged In: YES user_id=6380 Tim, Peter's regex starts with (?s) which is the same as compiling with re.S or re.DOTALL which makes '.' match newline -- the opposite of what you seem to think (?s) means. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-12-15 11:08 Message: Logged In: YES user_id=31435 Peter, assuming Guido is correct that you did not intend to match strings with embedded double quotes, here's a correct (and much more efficient) replacement for your regexp: r'"[^"\n]*"[;\n]' Note that (contra Guido's suggestion <wink>), a newline has to be part of the negated character class, because you asked for single-line mode so presumably didn't want your .*? to match any newlines either. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-15 08:03 Message: Logged In: YES user_id=6380 Do you really *want* that pattern to match input like this: "xxx"xxx"; ??? If not, replace the (.*?) with ([^"]*) -- this will dramatically reduce the amount of backtracking generated if there are unbalanced quotes in the input. ---------------------------------------------------------------------- Comment By: P. de Jong (peterdejong) Date: 2001-12-15 05:23 Message: Logged In: YES user_id=402001 Hi Guido! The regular expression used was: '(?s)"(.*?)"(;|\n)' Peter ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2001-12-14 08:32 Message: Logged In: YES user_id=38376 Three additional comments: 1) The fact that you get this under 2.0 indicates that your regular expression doesn't run very well under 1.5.2 either; it can most likely be rewritten to be much faster, and use much less memory. 2) But if that's okay, you can always work around this by replacing "import re" with "import pre as re" (or "import pre; re = pre") 3) This will be fixed in future versions. </F> ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-12-14 07:57 Message: Logged In: YES user_id=31435 Assigned to /F. Echo Guido's belief that the regexp can almost certainly be rewritten to avoid this and run much faster at the same time. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-14 05:53 Message: Logged In: YES user_id=6380 Hi Peter! This usually happens when the pattern contains * or + in a way that causes more backtracking than one would naively expect. Can you show us the pattern? There's usually an easy way to rewrite the pattern so that it won't overflow -- and it will be faster too... BTW I would upgrade to Python 2.1.1 -- that's the most stable release to date. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=493252&group_id=5470
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4