As you may know, I actually *use* the HTML5 Sanitizer in my branch of Instiki.
Recently, I found I was getting inconsistent results. To track down the problem, I created the following test { "name": "quotes_in_attributes", "input": "<img src='foo' title='\"foo\" bar' />", "rexml": "<img src='foo' title='\"foo\" bar' />", "output": "<img title='"foo" bar' src='foo'/>" } sanitize_rexml passes the test, but sanitize_html and sanitize_xhtml fail: test_quotes_in_attributes(SanitizeTest) [tests/test_sanitizer.rb:35:in `check_sanitization' tests/test_sanitizer.rb:134:in `test_quotes_in_attributes']: <"<img title='"foo" bar' src='foo'/>"> expected but was <"<img title='&quot;foo" bar' src='foo'/>">. It turns out that this is easily fixed by the following change in tokenizer.rb: # This method replaces the need for "entityInAttributeValueState". def process_entity_in_attribute - entity = consume_entity(true) + entity = consume_entity() if entity @current_token[:data][-1][1] += entity If I make this change, all tests (not just the sanitizer tests) pass. Unfortunately, I don't really understand the tokenizer logic, at this point. I don't understand why changing "from_attribute=true" to "from_attribute=false" (the default) fixes the problem. And I don't understand what will *break* if we make that change. Nothing we currently test for, apparently, but probably that's due to an insufficiency of unit tests. So ... Somebody please come up with a unit test that will exercise the "consume_entity(true)" logic and/or come up with a better fix for this regression. Or I could just commit this change and let people squeal when something breaks later... --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4