A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00637.html below:

Problem with attribute having unclosed quotes (python html5lib)

*For example*: (missing quotes):
1. <a href="http://www.somewebsite.com> some link </a>
2. <img src=">
It only happens when the input has nothing valid. If i replace example 2 
with any of this: 
1. <div> <img src="> </div>
2. <div> </div> <img src=">

(it can be div, or anything else, as long as the malformed tag is not the 
only element)


*Code fragment*:

parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder('dom'), 
tokenizer=sanitizer.HTMLSanitizer)
sometree = parser.parseFragment(bad_html)
walker = treewalkers.getTreeWalker('dom')
stream = walker(sometree)

s = serializer.htmlserializer.HTMLSerializer(quote_attr_values=True)
nice_html = s.render(stream) <----*it fails here*



*The question*:

I would like to know if this is the expected behavior or i am doing something 
wrong.



*Additional** info*:

I'm using the lib for sanitizing user input.


*Output*:

File "somemodule.py", line 20, in somefunction 

  nice_html = s.render(stream)

File 
"some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py",
 line 302, in render
  return u"".join(list(self.serialize(treewalker)))


File 
"some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/serializer/htmlserializer.py",
 line 192, in serialize

  for token in treewalker:

File 
"some_env/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/optionaltags.py",
 line 15, in __iter__
  type = token["type"]

TypeError: 'NoneType' object has no attribute '__getitem__'

-- 
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
To view this discussion on the web, visit 
https://groups.google.com/d/msg/html5lib-discuss/-/sSiTs1l1xNcJ.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4