Status: New Owner: ---- New issue 116 by gareth.r...@pobox.com: treewalker escapes from subtree if root of subtree has a next sibling http://code.google.com/p/html5lib/issues/detail?id=116
I'm using html5lib 0.11.1 with Python 2.5 on Mac OS X 10.5. Consider the following interaction with html5lib: >>> from html5lib import html5parser, serializer, treebuilders, treewalkers >>> s = serializer.htmlserializer.HTMLSerializer() >>> walker = treewalkers.getTreeWalker('dom') >>> def contents(node): ... """Return the serialized content of 'node'.""" ... return u''.join(s.serialize(walker(node))) ... >>> doc = html5parser.HTMLParser(tree = treebuilders.getTreeBuilder('dom')).parse(u'<table><tr><td>A</table>B') >>> contents(doc.getElementsByTagName('table')[0]) # [1] u'<table><tr><td>A</table>B' >>> contents(doc.getElementsByTagName('tr')[0]) # [2] u'<tr><td>A' The output from [2] is what I expect to see: the serialized content of the <tr> node and its children. However, the output from [1] seems wrong to me. I expected to get the serialized content of the <table> node (only), but instead I get the serialized content of the <table> node plus the remainder of the document. I believe the underlying cause of the problem is the __iter__ method of NonRecursiveTreeWalker in html5lib/treewalkers/_base.py. It aims to walk the nodes of the subtree of self.tree in prefix order, and is supposed to stop when it returns to to the root of the subtree (see the comparison "if self.tree is currentNode" on line 153). However, the code for stepping to the next sibling is executed before this stopping test, causing the traversal to escape from the subtree (but only if the root of the subtree actually has a next sibling). Suggested fix: exchange the step to the next sibling and the stopping test. -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to html5lib-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4