> Over in the Web SIG, it was noted that the HTML parser in htmllib has > handlers for HTML 2.0 elements, and it should really support HTML 4.01, the > current version. I'm looking into doing this. > > We actually have two HTML parsers: htmllib.py and the more recent > HTMLParser.py. The initial check-in comment for 2001/05/18 for > HTMLParser.py reads: > > A much improved HTML parser -- a replacement for sgmllib. The API is > derived from but not quite compatible with that of sgmllib, so it's a > new file. I suppose it needs documentation, and htmllib needs to be > changed to use this instead of sgmllib, and sgmllib needs to be > declared obsolete. But that can all be done later. > > sgmllib only handles those bits of SGML needed for HTML, and anyone doing > serious SGML work is going to have to use a real SGML parser, so deprecating > sgmllib is reasonable. HTMLParser needs no changes for HTML 4.01; only > htmllib needs to get a bunch more handler methods. > > Should I try to do this for 2.4? I'm unclear on what you plan to do -- repeal sgmllib an rewrite htmllib to use HTMLParser internally for a backwards compatible interface? > (I can't find an explanation of how the API differs between the two modules > but can figure it out by inspecting the code, and will try to keep the > htmllib module backward-compatible.) That would be required for a few releases, yes. I'm okay with deprecating sgmllib faster than htmllib. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4