A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00124.html below:

Tokenizer Regression in Ruby

As you may know, I actually *use* the HTML5 Sanitizer in my branch of
Instiki.
Recently, I found I was getting inconsistent results. To track down
the problem, I created the following test

  {
    "name": "quotes_in_attributes",
    "input": "<img src='foo' title='\"foo\" bar' />",
    "rexml": "<img src='foo' title='\"foo\" bar' />",
    "output": "<img title='&quot;foo&quot; bar' src='foo'/>"
  }

sanitize_rexml passes the test, but sanitize_html and sanitize_xhtml
fail:

test_quotes_in_attributes(SanitizeTest)
    [tests/test_sanitizer.rb:35:in `check_sanitization'
     tests/test_sanitizer.rb:134:in `test_quotes_in_attributes']:
<"<img title='&quot;foo&quot; bar' src='foo'/>"> expected but was
<"<img title='&amp;quot;foo&quot; bar' src='foo'/>">.

It turns out that this is easily fixed by the following change in
tokenizer.rb:

     # This method replaces the need for
"entityInAttributeValueState".
     def process_entity_in_attribute
-       entity = consume_entity(true)
+      entity = consume_entity()
       if entity
         @current_token[:data][-1][1] += entity

If I make this change, all tests (not just the sanitizer tests) pass.

Unfortunately, I don't really understand the tokenizer logic, at this
point. I don't understand why changing "from_attribute=true" to
"from_attribute=false" (the default) fixes the problem. And I don't
understand what will *break* if we make that change. Nothing we
currently test for, apparently, but probably that's due to an
insufficiency of unit tests.

So ...

Somebody please come up with a unit test that will exercise the
"consume_entity(true)" logic and/or come up with a better fix for this
regression.

Or I could just commit this change and let people squeal when
something breaks later...


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4