Some of the input I'd like to use html5lib with is not only malformed, but its actual structure is very messy as well. (In some cases, most of the page is on a single line of HTML.) Is there some way of telling the serializer to break up the HTML by inserting white space using some standardized set of rules, e.g., progressively indent when certain tags are encountered, insert line breaks (\n characters) and so on?
If not, can anyone suggest another library that could? (I'd be passing it the cleaned output from html5lib, so this library wouldn't have to be particularly tolerant of malformed HTML; html5lib would handle that part.) Thanks! peppergrower -- You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send an email to html5lib-disc...@googlegroups.com. To unsubscribe from this group, send email to html5lib-discuss+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4