A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/ below:

HTMLRewriter · Cloudflare Workers docs

The HTMLRewriter class allows developers to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application. It can be thought of as a jQuery-like experience directly inside of your Workers application. Leaning on a powerful JavaScript API to parse and transform HTML, HTMLRewriter allows developers to build deeply functional applications.

The HTMLRewriter class should be instantiated once in your Workers script, with a number of handlers attached using the on and onDocument functions.

new HTMLRewriter()

.on("*", new ElementHandler())

.onDocument(new DocumentHandler());

Throughout the HTMLRewriter API, there are a few consistent types that many properties and methods use:

There are two handler types that can be used with HTMLRewriter: element handlers and document handlers.

An element handler responds to any incoming element, when attached using the .on function of an HTMLRewriter instance. The element handler should respond to element, comments, and text. The example processes div elements with an ElementHandler class.

class ElementHandler {

element(element) {

// An incoming element, such as `div`

console.log(`Incoming element: ${element.tagName}`);

}

comments(comment) {

// An incoming comment

}

text(text) {

// An incoming piece of text

}

}

async function handleRequest(req) {

const res = await fetch(req);

return new HTMLRewriter().on("div", new ElementHandler()).transform(res);

}

A document handler represents the incoming HTML document. A number of functions can be defined on a document handler to query and manipulate a document’s doctype, comments, text, and end. Unlike an element handler, a document handler’s doctype, comments, text, and end functions are not scoped by a particular selector. A document handler's functions are called for all the content on the page including the content outside of the top-level HTML tag:

class DocumentHandler {

doctype(doctype) {

// An incoming doctype, such as <!DOCTYPE html>

}

comments(comment) {

// An incoming comment

}

text(text) {

// An incoming piece of text

}

end(end) {

// The end of the document

}

}

All functions defined on both element and document handlers can return either void or a Promise<void>. Making your handler function async allows you to access external resources such as an API via fetch, Workers KV, Durable Objects, or the cache.

class UserElementHandler {

async element(element) {

let response = await fetch(new Request("/user"));

// fill in user info using response

}

}

async function handleRequest(req) {

const res = await fetch(req);

// run the user element handler via HTMLRewriter on a div with ID `user_info`

return new HTMLRewriter()

.on("div#user_info", new UserElementHandler())

.transform(res);

}

The element argument, used only in element handlers, is a representation of a DOM element. A number of methods exist on an element to query and manipulate it:

The endTag argument, used only in handlers registered with element.onEndTag, is a limited representation of a DOM element.

Since Cloudflare performs zero-copy streaming parsing, text chunks are not the same thing as text nodes in the lexical tree. A lexical tree text node can be represented by multiple chunks, as they arrive over the wire from the origin.

Consider the following markup: <div>Hey. How are you?</div>. It is possible that the Workers script will not receive the entire text node from the origin at once; instead, the text element handler will be invoked for each received part of the text node. For example, the handler might be invoked with “Hey. How ”, then “are you?”. When the last chunk arrives, the text’s lastInTextNode property will be set to true. Developers should make sure to concatenate these chunks together.

The comments function on an element handler allows developers to query and manipulate HTML comment tags.

class ElementHandler {

comments(comment) {

// An incoming comment element, such as <!-- My comment -->

}

}

The doctype function on a document handler allows developers to query a document’s doctype ↗.

class DocumentHandler {

doctype(doctype) {

// An incoming doctype element, such as

// <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

}

}

The end function on a document handler allows developers to append content to the end of a document.

class DocumentHandler {

end(end) {

// The end of the document

}

}

This is what selectors are and what they are used for.

If a handler throws an exception, parsing is immediately halted, the transformed response body is errored with the thrown exception, and the untransformed response body is canceled (closed). If the transformed response body was already partially streamed back to the client, the client will see a truncated response.

async function handle(request) {

let oldResponse = await fetch(request);

let newResponse = new HTMLRewriter()

.on("*", {

element(element) {

throw new Error("A really bad error.");

},

})

.transform(oldResponse);

// At this point, an expression like `await newResponse.text()`

// will throw `new Error("A really bad error.")`.

// Thereafter, any use of `newResponse.body` will throw the same error,

// and `oldResponse.body` will be closed.

// Alternatively, this will produce a truncated response to the client:

return newResponse;

}


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4