On Mon, 2 Nov 2009 at 22:06, Guido van Rossum wrote: > On Mon, Nov 2, 2009 at 9:51 PM, ssteinerX at gmail.com <ssteinerx at gmail.com> wrote: >> BeautifulSoup, which I use every day, is one such product. Since the crappy >> old SMGL parser's gone, BeautifulSoup uses the one that's left in Python 3 >> and it makes BeautifulSoup completely useless for my daily work. > > This sounds an area where some help might be useful. Perhaps the > quickest solution would simply be to copy the old crappy "sgml" based > html parser into a new version of BeautifulSoup. Though I imagine what > it really needs is a "quirks mode" parser that is compatible with the > HTML dialect accepted by, say, IE6. Maybe a summer of code project? It's not a matter of quirks. It's a matter of being able to parse truly broken html/xml, which browsers unfortunately do too well for everyone else's sanity. So, call it a "sloppy mode" parser, and then yes, that would solve the problem. --David (RDM)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4