RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2003-January/032469.html below:

[Python-Dev] HTMLParser patches

[Python-Dev] HTMLParser patches [Python-Dev] HTMLParser patchesjohn paulson munch@acm.org
Mon, 27 Jan 2003 14:19:24 -0800

Previous message: [Python-Dev] the new 2.3a1 settimeout() with httplib and SSL
Next message: [Python-Dev] HTMLParser patches
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I've submitted two patches for HTMLParser.py and
test_htmlparser.py.  They were to fix two problems
lexing some html pages I found in the wild.

1. Allow "," in attributes
    A page had the attribute "color=rgb(1,2,3)",
    and the parser choked on the ",".  Added the
    "," to the list of allowed characters.

2. More robust <SCRIPT> processing.
    The eBay homepage has unprotected javascript
    including the line 'vb += "</SCR"+"IPT>".  The
    parser choked on that line.  I modified the
    source to accept a more robust regex for script
    and style endtags.  A side-effect of this is that
    any "<!--" .. "-->" within a script/style will
    be parsed as a comment.  If that behavior is
    incorrect, the regex can be modified.

Previous message: [Python-Dev] the new 2.3a1 settimeout() with httplib and SSL
Next message: [Python-Dev] HTMLParser patches
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4