> Log: > The spec says: "All U+0000 NULL characters in the input must be replaced > by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such > characters is a parse error." > > > Modified: trunk/testdata/tokenizer/test2.test > =============================================== > --- trunk/testdata/tokenizer/test2.test (original) > +++ trunk/testdata/tokenizer/test2.test Tue Jun 26 23:58:40 2007 > @@ -118,7 +118,7 @@ > {"description":"Null Byte Replacement", > "input":"\u0000", > -"output":[["Character", "\ufffd"]]} > +"output":["ParseError", ["Character", "\ufffd"]]}
Fixing this in html5lib would require huge refactoring because this conversion is done in the HTMLInputStream which doesn't yield tokens (and parse errors are tokens currently). I suggest refactoring how parse errors are reported. First, we probably shouldn't use tokens to represent parse errors. I suggest using either something along the lines of the 'warnings' Python module (with a ParseError class inheriting from Warning and carrying a reference to source object (input stream, tokenizer or parser) and position within the input stream) or something resembling the SAX ErrorHandler (the parser registers its error handler on the newly created tokenizer; and the tokenizer in turn registers its error handler on the newly created input stream), defaulting to a handler adding parse errors to the parser's errors list, for backwards compatibility (most probably cannot be achieve with a warnings.warn-looking reporting model). But this means that tokenizer tests might need to be refactored also, because we might not be able to "arrange" parse errors "pseudo tokens" in the right order for the tests to pass (or maybe we should just extract them from the expected output and then check whether the number of reported parse errors is the same as the number expected, without checking where they happened). Any thoughts? -- Thomas Broyer --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4