This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and serializing DOM nodes.
Status of This DocumentThis section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This specification is based on the original work of the DOM Parsing and Serialization Living Specification, though it has diverged in terms of supported features, normative requirements, and algorithm specificity. As appropriate, relevant fixes from the living specification are incorporated into this document.
This document was published by the Web Platform Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to www-dom@w3.org (subscribe, archives) with DOM-Parsing
at the start of your email's subject. All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
Table of ContentsDOMParser
interface
XMLSerializer
interface
Element
interface
Range
interface
This specification will not advance to Proposed Recommendation before the spec's test suite is completed and two or more independent implementations pass each test, although no single implementation must pass each test. We expect to meet this criteria no sooner than 24 October 2014. The group will also create an Implementation Report.
1. ConformanceAs well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.
1.1 DependenciesThe IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]
Some of the terms used in this specification are defined in [DOM4], [HTML5], and [XML10].
1.2 ExtensibilityVendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.
If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.
When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.
2. TerminologyThe term context object means the object on which the method or attribute being discussed was called.
3. NamespacesThe HTML namespace is http://www.w3.org/1999/xhtml
.
The XML namespace is http://www.w3.org/XML/1998/namespace
.
The XMLNS namespace is http://www.w3.org/2000/xmlns/
.
The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element:
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
DocumentFragment
whose node document is context element's node document.The following steps form the fragment serializing algorithm, whose arguments are a Node
node and a flag require well-formed:
To produce an HTML serialization of a Node
node, the user agent must run the HTML fragment serialization algorithm [HTML5] on node and return the string produced.
To produce an XML serialization of a Node
node given a flag require well-formed, run the following steps:
null
. The context namespace is changed when a node serializes a different default namespace definition from its parent. The algorithm assumes no namespace to start.namespaceURI
and namespace prefix
pairs, where namespaceURI
values are the map's keys, and prefix
values are the map's key values. The namespace prefix map will be populated by previously seen namespaceURIs and their most recent prefix associations for a subtree. Note: the namespace prefix map only associates a single prefix value with a given namespaceURI. During serialization, if different namespace prefixes are found that map to the same namespaceURI, the last one encountered "wins" by replacing the existing key value in the map with the new prefix value.xml
" as the key value.1
. The generated namespace prefix index is used to generate a new unique prefix value when no suitable existing namespace prefix is available to serialize a node's namespaceURI (or the namespaceURI of one of node's attributes). See the generate a prefix algorithm.DOMException
with name "InvalidStateError
".An XML serialization differs from an HTML serialization in the following ways:
EmptyElemTag
production of [XML10]).Otherwise, the algorithm for producing an XML serialization is designed to produce a serialization that is compatible with the HTML parser. For example, elements in the HTML namespace that contain no child nodes are serialized with an explicit begin and end tag rather than using the self-closing tag syntax [XML10].
To run the XML serialization algorithm on a node given a context namespace namespace, a namespace prefix map prefix map, a generated namespace prefix index prefix index, and a flag require well-formed, the user agent must run the appropriate steps, depending on node's interface:
Element
Run the following algorithm:
true
), and this node's localName
attribute contains the character ":
" (U+003A COLON) or does not match the XML Name
production [XML10], then throw an exception; the serialization of this node would not be a well-formed element.<
" (U+003C LESS-THAN SIGN).false
.false
.null
.Note
This above step will update the map with any found namespace prefix definitions, add the found prefix definitions to the element prefixes list, optionally set the duplicate prefix definition value, and return a local default namespace value defined by a default namespace attribute if one exists. Otherwise it returns null
.
namespaceURI
attribute.null
, then set ignore namespace definition attribute to true
.xml:
" and the value of node's localName
.localName
. The node's prefix is always dropped.prefix
attribute.null
.null
(a suitable namespace prefix is defined which maps to ns), then:
:
" (U+003A COLON), and node's localName
. There exists on this node or the node's ancestry a namespace prefix definition that defines the node's namespace.null
(there exists a locally-defined default namespace declaration attribute), then let inherited ns get the value of ns.null
and local default namespace is null
, then:
:
" (U+003A COLON), and node's localName
.
" (U+0020 SPACE);xmlns:
";="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);"
" (U+0022 QUOTATION MARK).null
, or local default namespace is not null
and its value is not equal to ns, then:
true
.localName
.
" (U+0020 SPACE);xmlns
";="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);"
" (U+0022 QUOTATION MARK).localName
, let the value of inherited ns be ns, and append the value of qualified name to markup.localName
matches any one of the following void elements: "area
", "base
", "basefont
", "bgsound
", "br
", "col
", "embed
", "frame
", "hr
", "img
", "input
", "keygen
", "link
", "menuitem
", "meta
", "param
", "source
", "track
", "wbr
"; then append the following to markup, in the order listed:
" (U+0020 SPACE);/
" (U+002F SOLIDUS).true
./
" (U+002F SOLIDUS) to markup and set the skip end tag flag to true
.>
" (U+003E GREATER-THAN SIGN) to markup.true
, then return the value of markup and skip the remaining steps. The node is a leaf-node.localName
matches the string "template
", then this is a template
element. Append to markup the result of running the XML serialization algorithm on the template element's template contents (a DocumentFragment
), providing the value of inherited ns for the context namespace, map for the namespace prefix map, prefix index for the generated namespace prefix index, and the value of the require well-formed flag. This allows template content to round-trip , given the rules for parsing XHTML documents [HTML5].</
" (U+003C LESS-THAN SIGN, U+002F SOLIDUS);>
" (U+003E GREATER-THAN SIGN).Document
If the require well-formed flag is set (its value is true
), and this node has no documentElement
(the documentElement
attribute's value is null
), then throw an exception; the serialization of this node would not be a well-formed document.
Otherwise, run the following steps:
doctype
attribute provided the require well-formed flag if node's doctype
attribute is not null
.Comment
If the require well-formed flag is set (its value is true
), and node's data
contains characters that are not matched by the XML Char
production [XML10] or contains "--
" (two adjacent U+002D HYPHEN-MINUS characters) or that ends with a "-
" (U+002D HYPHEN-MINUS) character, then throw an exception; the serialization of this node's data
would not be well-formed.
Return the concatenation of "<!--
", node's data
, and "-->
".
Text
true
), and node's data
contains characters that are not matched by the XML Char
production [XML10], then throw an exception; the serialization of this node's data
would not be well-formed.data
.&
" in markup by "&
".<
" in markup by "<
".>
" in markup by ">
".DocumentFragment
DocumentType
ProcessingInstruction
true
), and node's target
contains a ":
" (U+003A COLON) character or is an ASCII case-insensitive match for the string "xml
", then throw an exception; the serialization of this node's target
would not be well-formed.true
), and node's data
contains characters that are not matched by the XML Char
production [XML10] or contains the string "?>
" (U+003F QUESTION MARK, U+003E GREATER-THAN SIGN), then throw an exception; the serialization of this node's data
would not be well-formed.To produce a DocumentType serialization of a Node
node, given a require well-formed flag, the user agent must return the result of the following algorithm:
true
and the node's publicId
attribute contains characters that are not matched by the XML PubidChar
production [XML10], then throw an exception; the serialization of this node would not be a well-formed document type declaration.true
and the node's systemId
attribute contains characters that are not matched by the XML Char
production [XML10] or that contains both a ""
" (U+0022 QUOTATION MARK) and a "'
" (U+0027 APOSTROPHE), then throw an exception; the serialization of this node would not be a well-formed document type declaration.<!DOCTYPE
" to markup.
" (U+0020 SPACE) to markup.name
attribute to markup. For a node belonging to an HTML document, the value will be all lowercase.publicId
is not the empty string then append the following, in the order listed, to markup:
" (U+0020 SPACE);PUBLIC
";
" (U+0020 SPACE);"
" (U+0022 QUOTATION MARK);publicId
attribute;"
" (U+0022 QUOTATION MARK).systemId
is not the empty string and the node's publicId
is set to the empty string, then append the following, in the order listed, to markup:
" (U+0020 SPACE);SYSTEM
".systemId
is not the empty string then append the following, in the order listed, to markup:
" (U+0020 SPACE);"
" (U+0022 QUOTATION MARK);systemId
attribute;"
" (U+0022 QUOTATION MARK).>
" (U+003E GREATER-THAN SIGN) to markup.To record the namespace information for an Element
element, given a namespace prefix map map, an element prefixes list (initially empty), and a duplicate prefix definition reference, the user agent must run the following steps:
null
.attributes
, in the order they are specified in the element's attribute list:
Note
The following conditional steps add namespace prefixes into the element prefixes list and add or replace them in the map. Only attributes in the XMLNS namespace are considered (e.g., attributes made to look like namespace declarations via setAttribute("xmlns:pretend-prefix", "pretend-namespace")
are not included).
namespaceURI
value.prefix
.null
, then attr is a default namespace declaration. Set the default namespace attr value to attr's value
and stop running these steps, returning to Main to visit the next attribute.null
and attr is a namespace prefix definition. Run the following steps:
localName
.value
.To generate a prefix given a namespace prefix map map, a string new namespace, and a reference to a generated namespace prefix index prefix index, the user agent must run the following steps:
ns
" and the current numerical value of prefix index.The XML serialization of the attributes of an Element
element together with a namespace prefix map map, a generated prefix index prefix index reference, a flag ignore namespace definition attribute, a duplicate prefix definition value, and a flag require well-formed, is the result of the following algorithm:
namespaceURI
and localName
pairs, and is populated as each attr is processed. This set is used to [optionally] enforce the well-formed constraint that an element cannot have two attributes with the same namespaceURI
and localName
. This can occur when two otherwise identical attributes on the same element differ only by their prefix values.attributes
, in the order they are specified in the element's attribute list:
true
), and the localname set contains a tuple whose values match those of a new tuple consisting of attr's namespaceURI
attribute and localName
attribute, then throw an exception; the serialization of this attr would fail to produce a well-formed element serialization.namespaceURI
attribute and localName
attribute, and add it to the localname set.namespaceURI
value.null
.null
, then run these sub-steps:
prefix
is null
and the ignore namespace definition attribute flag is true
or the attr's prefix
is not null
and the attr's localName
matches the value of duplicate prefix definition, then stop running these steps and goto Main to visit the next attribute.
" (U+0020 SPACE);xmlns:
";="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);"
" (U+0022 QUOTATION MARK).
" (U+0020 SPACE) to result.null
, then append to result the concatenation of candidate prefix with ":
" (U+003A COLON).true
), and this attr's localName
attribute contains the character ":
" (U+003A COLON) or does not match the XML Name
production [XML10] or equals "xmlns
" and attribute namespace is null
, then throw an exception; the serialization of this attr would not be a well-formed attribute.localName
;="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);value
attribute and the require well-formed flag as input;"
" (U+0022 QUOTATION MARK).To serialize an attribute value given an attribute value and require well-formed flag, the user agent must run the following steps:
true
), and attribute value contains characters that are not matched by the XML Char
production [XML10], then throw an exception; the serialization of this attribute value would fail to produce a well-formed element serialization.null
, then return the empty string."
" with ""
"&
" with "&
"<
" with "<
">
" with ">
"Note
This matches behavior present in browsers, and goes above and beyond the grammar requirement in the XML specification's AttValue production [XML10] by also replacing ">
" characters.
DOMParser
interface
The DOMParser()
constructor must return a new DOMParser
object.
[Constructor] interface DOMParser { [NewObject] Document parseFromString (DOMString str, SupportedType type); };5.1 Methods
parseFromString
The parseFromString(str, type)
method must run these steps, depending on type:
text/html
"
Parse str with an HTML parser
, and return the newly created document.
The scripting flag must be set to "disabled".
Note
meta
elements are not taken into account for the encoding used, as a Unicode stream is passed into the parser.
Note
script
elements get marked unexecutable and the contents of noscript
get parsed as markup.
text/xml
"
application/xml
"
application/xhtml+xml
"
image/svg+xml
"
XML parser
.Document
interface rather than the XMLDocument
interface.Let root be a new Element
, with its local name set to "parsererror
" and its namespace set to "http://www.mozilla.org/newlayout/xml/parsererror.xml
".
At this point user agents may append nodes to root, for example to describe the nature of the error.
In any case, the returned document's content type must be the type argument. Additionally, the document must have a URL value equal to the URL of the active document, a location value of null
.
DOMString
✘ ✘ type SupportedType
✘ ✘
Return type: Document
XMLSerializer
interface
The XMLSerializer()
constructor must return a new XMLSerializer
object.
[Constructor] interface XMLSerializer { DOMString serializeToString (Node root); };6.1 Methods
serializeToString
serializeToString(root)
method must produce an XML serialization of root passing a value of false
for the require well-formed parameter, and return the result. Parameter Type Nullable Optional Description root Node
✘ ✘
Return type: DOMString
Element
interface
partial interface Element { [CEReactions, TreatNullAs=EmptyString] attribute DOMString innerHTML; [CEReactions, TreatNullAs=EmptyString] attribute DOMString outerHTML; [CEReactions] void insertAdjacentHTML (DOMString position, DOMString text); };7.1 Attributes
innerHTML
of type DOMString
The innerHTML
IDL attribute represents the markup of the Element
's contents.
innerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element's contents.
Can be set, to replace the contents of the element with nodes parsed from the given string.
In the case of an XML document, will throw a DOMException
with name "InvalidStateError
" if the Element
cannot be serialized to XML, and a DOMException
with name "SyntaxError
" if the given string is not well-formed.
On getting, return the result of invoking the fragment serializing algorithm on the context object providing true
for the require well-formed flag (this might throw an exception instead of returning a string).
On setting, these steps must be run:
outerHTML
of type DOMString
The outerHTML
IDL attribute represents the markup of the Element
and its contents.
outerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.
Can be set, to replace the element with nodes parsed from the given string.
In the case of an XML document, will throw a DOMException
with name "InvalidStateError
" if the element cannot be serialized to XML, and a DOMException
with name "SyntaxError
" if the given string is not well-formed.
Throws a DOMException
with name "NoModificationAllowedError
" if the parent of the element is the Document
node.
On getting, return the result of invoking the fragment serializing algorithm on a fictional node whose only child is the context object providing true
for the require well-formed flag (this might throw an exception instead of returning a string).
On setting, the following steps must be run:
Document
, throw a DOMException
with name "NoModificationAllowedError
" exception.DocumentFragment
, let parent be a new Element
with
body
as its local name,insertAdjacentHTML
insertAdjacentHTML
(position, text)
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
Throws a DOMException
with name "SyntaxError
" if the arguments have invalid values (e.g., in the case of an XML document, if the given string is not well-formed).
Throws a DOMException
with name "NoModificationAllowedError
" if the given position isn't possible (e.g. inserting elements after the root element of a Document
).
The insertAdjacentHTML(position, text)
method must run these steps:
Let context be the context object's parent.
If context is null or a document, throw a DOMException
with name "NoModificationAllowedError
".
Throw a DOMException
with name "SyntaxError
".
Element
or the following are all true:
html
", andlet context be a new Element
with
body
as its local name,DOMString
✘ ✘ text DOMString
✘ ✘
Return type: void
Range
interface
partial interface Range { [CEReactions, NewObject] DocumentFragment createContextualFragment (DOMString fragment); };8.1 Methods
createContextualFragment
createContextualFragment
(markupString)
DocumentFragment
, created from the markup string given.
The createContextualFragment(fragment)
method must run these steps:
Let element be as follows, depending on node's interface:
Document
DocumentFragment
Element
Text
Comment
DocumentType
ProcessingInstruction
html
", andlet element be a new element with
body
" as its local name,DOMString
✘ ✘
Return type: DocumentFragment
The following is an informative summary of the changes since the last publication of this specification. A complete revision history of the Editor's Drafts of this specification can be found here.
B. AcknowledgementsThanks to Ms2ger [Mozilla] for maintaining the initial drafts of this specification and for its continued improvement in the Living Specification.
Thanks to Victor Costan, Aryeh Gregor, Anne van Kesteren, Arkadiusz Michalski, Simon Pieters, Henri Sivonen, Josh Soref and Boris Zbarsky, for their useful comments.
Special thanks to Ian Hickson for defining the innerHTML
and outerHTML
attributes, and the insertAdjacentHTML()
method in [HTML5] and his useful comments.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3