Status: New Owner: ---- New issue 113 by alexey.kvirc: cannot handle mailformed attribute names with html5lib and lxml http://code.google.com/p/html5lib/issues/detail?id=113
What steps will reproduce the problem? launch this code: html_code = "<a 123=456></a>" html_parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder ("lxml")) print html_parser.parse(html_code) What is the expected output? What do you see instead? This leads to excption: Traceback (most recent call last): File "<... skipped ...>\src\lxlm_test.py", line 12, in <module> print html_parser.parse(html_code) File "build\bdist.win32\egg\html5lib\html5parser.py", line 196, in parse File "build\bdist.win32\egg\html5lib\html5parser.py", line 94, in _parse File "build\bdist.win32\egg\html5lib\html5parser.py", line 164, in mainLoop File "build\bdist.win32\egg\html5lib\html5parser.py", line 582, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 617, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 657, in startTagOther File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 747, in startTagOther File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 819, in startTagOther File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in processStartTag File "build\bdist.win32\egg\html5lib\html5parser.py", line 1056, in startTagA File "build\bdist.win32\egg\html5lib\html5parser.py", line 907, in addFormattingElement File "build\bdist.win32\egg\html5lib\treebuilders\_base.py", line 261, in insertElementNormal File "build\bdist.win32\egg\html5lib\treebuilders\etree_lxml.py", line 223, in _setAttributes File "build\bdist.win32\egg\html5lib\treebuilders\etree_lxml.py", line 193, in __init__ File "lxml.etree.pyx", line 1945, in lxml.etree._Attrib.__setitem__ (src/ lxml/lxml.etree.c:42529) File "apihelpers.pxi", line 481, in lxml.etree._setAttributeValue (src/ lxml/lxml.etree.c:13687) File "apihelpers.pxi", line 1422, in lxml.etree._attributeValidOrRaise (src/lxml/lxml.etree.c:21640) ValueError: Invalid attribute name u'123' Please provide any additional information below. This leads to parsing failure on any page, that has broken attributes. -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to html5lib-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4