A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2006-August/068324.html below:

recently introduced sgmllib regexp bug hangs Python

[Python-Dev] 2.5: recently introduced sgmllib regexp bug hangs Python [Python-Dev] 2.5: recently introduced sgmllib regexp bug hangs PythonJohn J Lee jjl at pobox.com
Thu Aug 17 03:58:22 CEST 2006
Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C 
won't kill the process, CPU usage sits near 100%) under some 
circumstances.  There's a test case here:

http://python.org/sf/1541697


The problem isn't seen if you read the whole file at once (or almost the 
whole file at once).  (But that doesn't make it a non-bug, AFAICS.)

I'm not sure what the problem is, but presumably the relevant part of the 
patch is this:

+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+        r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+        r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]'
+        r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+    r')*\s*/?\s*(?=[<>])')


The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a 
regression introduced by his recent sgmllib patches, and has not yet been 
applied) does NOT fix the problem.

If nobody has time to fix this, perhaps rev 47154 should be reverted?


commit message for -r47154:

"""
SF bug #1504333: sgmlib should allow angle brackets in quoted values
(modified patch by Sam Ruby; changed to use separate REs for start and end
  tags to reduce matching cost for end tags; extended tests; updated to 
avoid
  breaking previous changes to support IPv6 addresses in unquoted attribute
  values)
"""


John

More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4