Over in the Web SIG, it was noted that the HTML parser in htmllib has handlers for HTML 2.0 elements, and it should really support HTML 4.01, the current version. I'm looking into doing this. We actually have two HTML parsers: htmllib.py and the more recent HTMLParser.py. The initial check-in comment for 2001/05/18 for HTMLParser.py reads: A much improved HTML parser -- a replacement for sgmllib. The API is derived from but not quite compatible with that of sgmllib, so it's a new file. I suppose it needs documentation, and htmllib needs to be changed to use this instead of sgmllib, and sgmllib needs to be declared obsolete. But that can all be done later. sgmllib only handles those bits of SGML needed for HTML, and anyone doing serious SGML work is going to have to use a real SGML parser, so deprecating sgmllib is reasonable. HTMLParser needs no changes for HTML 4.01; only htmllib needs to get a bunch more handler methods. Should I try to do this for 2.4? (I can't find an explanation of how the API differs between the two modules but can figure it out by inspecting the code, and will try to keep the htmllib module backward-compatible.) --amk
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4