A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00076.html below:

Handling of NUL in the input stream (r825)

> Log:
>  The spec says: "All U+0000 NULL characters in the input must be replaced
> by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such
> characters is a parse error."
>
>
> Modified: trunk/testdata/tokenizer/test2.test
>  ===============================================
>  --- trunk/testdata/tokenizer/test2.test (original)
>  +++ trunk/testdata/tokenizer/test2.test Tue Jun 26 23:58:40 2007
>  @@ -118,7 +118,7 @@
>   {"description":"Null Byte Replacement",
>   "input":"\u0000",
>  -"output":[["Character", "\ufffd"]]}
>  +"output":["ParseError", ["Character", "\ufffd"]]}
Fixing this in html5lib would require huge refactoring because this
conversion is done in the HTMLInputStream which doesn't yield tokens
(and parse errors are tokens currently).

I suggest refactoring how parse errors are reported. First, we
probably shouldn't use tokens to represent parse errors. I suggest
using either something along the lines of the 'warnings' Python module
(with a ParseError class inheriting from Warning and carrying a
reference to source object (input stream, tokenizer or parser) and
position within the input stream) or something resembling the SAX
ErrorHandler (the parser registers its error handler on the newly
created tokenizer; and the tokenizer in turn registers its error
handler on the newly created input stream), defaulting to a handler
adding parse errors to the parser's errors list, for backwards
compatibility (most probably cannot be achieve with a
warnings.warn-looking reporting model).

But this means that tokenizer tests might need to be refactored also,
because we might not be able to "arrange" parse errors "pseudo tokens"
in the right order for the tests to pass (or maybe we should just
extract them from the expected output and then check whether the
number of reported parse errors is the same as the number expected,
without checking where they happened).

Any thoughts?

-- 
Thomas Broyer

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4