A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00390.html below:

treewalker escapes from subtree if root of subtree has a next sibling

Status: New
Owner: ----

New issue 116 by gareth.r...@pobox.com: treewalker escapes from subtree if  
root of subtree has a next sibling
http://code.google.com/p/html5lib/issues/detail?id=116
I'm using html5lib 0.11.1 with Python 2.5 on Mac OS X 10.5.

Consider the following interaction with html5lib:

>>> from html5lib import html5parser, serializer, treebuilders, treewalkers
>>> s = serializer.htmlserializer.HTMLSerializer()
>>> walker = treewalkers.getTreeWalker('dom')
>>> def contents(node):
...     """Return the serialized content of 'node'."""
...     return u''.join(s.serialize(walker(node)))
...
>>> doc = html5parser.HTMLParser(tree =
treebuilders.getTreeBuilder('dom')).parse(u'<table><tr><td>A</table>B')
>>> contents(doc.getElementsByTagName('table')[0]) # [1]
u'<table><tr><td>A</table>B'
>>> contents(doc.getElementsByTagName('tr')[0]) # [2]
u'<tr><td>A'

The output from [2] is what I expect to see: the serialized content of the  
<tr> node and its
children.

However, the output from [1] seems wrong to me. I expected to get the  
serialized content of
the <table> node (only), but instead I get the serialized content of the  
<table> node plus the
remainder of the document.

I believe the underlying cause of the problem is the __iter__ method of  
NonRecursiveTreeWalker
in html5lib/treewalkers/_base.py. It aims to walk the nodes of the subtree  
of self.tree in prefix
order, and is supposed to stop when it returns to to the root of the  
subtree (see the
comparison "if self.tree is currentNode" on line 153). However, the code  
for stepping to the
next sibling is executed before this stopping test, causing the traversal  
to escape from the
subtree (but only if the root of the subtree actually has a next sibling).

Suggested fix: exchange the step to the next sibling and the stopping test.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4