Next: Parsing and generating JSON values, Previous: Database, Up: Text [Contents][Index]
33.30 Parsing HTML and XMLEmacs can be compiled with built-in libxml2 support.
This function returns non-nil
if built-in libxml2 support is available in this Emacs session.
When libxml2 support is available, the following functions can be used to parse HTML or XML text into Lisp object trees.
This function parses the text between start and end as HTML, and returns a list representing the HTML parse tree. It attempts to handle real-world HTML by robustly coping with syntax mistakes.
If start or end are nil
, they default to the values from point-min
and point-max
, respectively.
The optional argument base-url, if non-nil
, should be used for warnings and errors reported by the libxml2 library, but Emacs currently calls the library with errors and warnings disabled, so this argument is not used.
If the optional argument discard-comments is non-nil
, any top-level comment is discarded. (This argument is obsolete and will be removed in future Emacs versions. To remove comments, use the xml-remove-comments
utility function on the data before you call the parsing function.)
In the parse tree, each HTML node is represented by a list in which the first element is a symbol representing the node name, the second element is an alist of node attributes, and the remaining elements are the subnodes.
The following example demonstrates this. Given this (malformed) HTML document:
<html><head></head><body width=101><div class=thing>Foo<div>Yes
A call to libxml-parse-html-region
returns this DOM (document object model):
(html nil (head nil) (body ((width . "101")) (div ((class . "thing")) "Foo" (div nil "Yes"))))
This function renders the parsed HTML in dom into the current buffer. The argument dom should be a list as generated by libxml-parse-html-region
. This function is, e.g., used by EWW in The Emacs Web Wowser Manual.
This function is the same as libxml-parse-html-region
, except that it parses the text as XML rather than HTML (so it is stricter about syntax).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4