A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00386.html below:

cannot handle mailformed attribute names with html5lib and lxml

Status: New
Owner: ----

New issue 113 by alexey.kvirc: cannot handle mailformed attribute names  
with html5lib and lxml
http://code.google.com/p/html5lib/issues/detail?id=113
What steps will reproduce the problem?

launch this code:
  html_code = "<a 123=456></a>"
  html_parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder
("lxml"))
  print html_parser.parse(html_code)

What is the expected output? What do you see instead?
This leads to excption:
Traceback (most recent call last):
   File "<... skipped ...>\src\lxlm_test.py", line 12, in <module>
     print html_parser.parse(html_code)
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 196, in parse
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 94, in _parse
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 164, in
mainLoop
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 582, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 617, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 657, in
startTagOther
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 747, in
startTagOther
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 819, in
startTagOther
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 428, in
processStartTag
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 1056, in
startTagA
   File "build\bdist.win32\egg\html5lib\html5parser.py", line 907, in
addFormattingElement
   File "build\bdist.win32\egg\html5lib\treebuilders\_base.py", line 261,
in insertElementNormal
   File "build\bdist.win32\egg\html5lib\treebuilders\etree_lxml.py", line
223, in _setAttributes
   File "build\bdist.win32\egg\html5lib\treebuilders\etree_lxml.py", line
193, in __init__
   File "lxml.etree.pyx", line 1945, in lxml.etree._Attrib.__setitem__ (src/
lxml/lxml.etree.c:42529)
   File "apihelpers.pxi", line 481, in lxml.etree._setAttributeValue (src/
lxml/lxml.etree.c:13687)
   File "apihelpers.pxi", line 1422, in lxml.etree._attributeValidOrRaise
(src/lxml/lxml.etree.c:21640)
ValueError: Invalid attribute name u'123'

Please provide any additional information below.
This leads to parsing failure on any page, that has broken attributes.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4