A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/html5lib/html5lib-python/issues/6 below:

html5lib.treebuilders.dom.dom2sax crashes on 'xml:lang' attribute · Issue #6 · html5lib/html5lib-python · GitHub

http://code.google.com/p/html5lib/issues/detail?id=200

Reported by vovanec, Mar 6, 2012

A simple test case(my program has more complex handler implementation but the problem is reproducible with the default handler):

import xml.sax.handler
import html5lib

def test(html):
    handler = xml.sax.handler.ContentHandler()
    parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
    dom = parser.parse(html)
    html5lib.treebuilders.dom.dom2sax(dom, handler)

html = '<html xml:lang="en">'
test(html)

With html5lib 0.95 it produces the following traceback:

python test.py 
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    test(html)
  File "test.py", line 10, in test
    html5lib.treebuilders.dom.dom2sax(dom, handler)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax
    for child in node.childNodes: dom2sax(child, handler, nsmap)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax
    del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')

With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.

Tested it with Python 2.6.5, Linux

Please advise.

Thanks,
--Vladimir


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4