James Graham wrote: > Simon Willison wrote: >> Last time I looked at the html5lib sanitizer I'm pretty sure it had an >> "experimental" label on it. Is this still the case? The unit tests >> look pretty solid: >> >> http://code.google.com/p/html5lib/source/browse/trunk/testdata/sanitizer/tests1.dat >> >> But they also haven't been modified since October last year. Is the >> sanitizing code aggressively maintained? > > Sam Ruby originally wrote most of the sanitizer; I don't know if he still has > time to maintain it. I think that the ruby version is more heavily used than > the > python one but the unit tests should be almost identical.
The Ruby one actually has slightly more unit tests. The sanitizer is based on the one originally in the feedparser, and now in Venus. Jacques Distler has extended it occasionally for Instiki (which is written in Ruby). As both implementations are based on whitelisting, extensions typically mean "accepts more valid input" as opposed to "rejects more evil". > Apart from next month (when I am on holiday), I should have considerably more > time to work on html5lib than I have had for the past few months and I plan > to > actively maintain the sanitizer if no one else wants to own it instead. > >> Basically, I want to know if I would be giving people good advice if I >> told them to use the html5lib sanitizer. Would the maintainers >> confidently recommend it to a friend? > > I would certainly be happier using the html part of the sanitizer more than > the > typical regexp based solutions. I am less sure if the CSS sanitizer is > perfect; > it may well be OK but it is built on a less strong foundation (since it does > not > actually parse CSS). The documentation could probably be improved although > for > people who ask on irc should get reasonably expedient help. So I don't know > of a > strong reason not to recommend it to a friend. > > Having said that there seem to be a small number of open issues, I'll look at > them some time in the not-too-distant future and check if they are real or > not. > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4