On Fri, Jul 29, 2011 at 13:16, Glyph Lefkowitz <glyph at twistedmatrix.com>wrote: > On Jul 29, 2011, at 3:00 PM, Matt wrote: > > I don't see any real reason to drop a decent piece of code (HTMLParser, > that is) in favor of a third party library when only relatively minor > updates are needed to bring it up to speed with the latest spec. > > > I am not really one to throw stones here, as Twisted contains a lenient > pseudo-XML parser which I still maintain - one which decidedly does *not* agree > with html5's requirements for dealing with invalid data, but just a bunch of > ad-hoc guesses of my own. > > My impression of HTML5 is that HTMLParser would require significant > modifications and possibly a drastic re-architecture in order to really do > HTML5 "right"; especially the parts that the html5lib authors claim makes > HTML5 streaming-unfriendly, i.e. subtree reordering when encountering > certain types of invalid data. > We could also have the code live side-by-side for a while (or indefinitely if that was really desired) by bringing html5lib in as either a separate module or having the relevant classes live in htmllib under different names. But all of this is just hypothetical until someone decides to do the legwork to actually make a proposal and get the coding done. -Brett > > But if I'm wrong about that, and there are just a few spec updates and > bugfixes that need to be applied, by all means, ignore my comment. > > -glyph > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20110729/37a99094/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4