A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ameshkov/diffupdates below:

ameshkov/diffupdates: Filter lists diff updates proposal

Filter Lists Differential Updates

A "filter list" is a fundamental component utilized by ad blockers. Essentially, a filter list is a set of rules and patterns designed to identify and subsequently block unwanted content like advertisements, trackers, and malicious websites. When an individual surfs the internet using a browser with an ad blocker enabled, this filter list is referenced to determine which elements on a webpage should be blocked or allowed. These lists are maintained by communities and organizations, and they are regularly updated to remain effective against the ever-evolving landscape of online advertisements and trackers.

The core issue revolves around the mechanism by which these filter lists are updated. As it stands, when there's a modification to a filter list, even if it's just a small change, the entire list needs to be redownloaded by the end user. This approach is inefficient for several reasons:

In essence, instead of fetching the entire filter list every time an update is available, users would only download the changes made since their last update (the 'diff' or difference). This would be achieved by employing a diff algorithm on the server side which identifies the changes between the current and previous versions of the filter list.

This approach significantly reduces bandwidth consumption, minimizes latency, and decreases server load, resulting in a more efficient and user-friendly experience.

Changes To Filter Lists Metadata

In order to use the differential update mechanism we propose to add one new field to the filter list metadata: Diff-Path.

This field will provide the relative path where the differential file (diff) for the filter list can be found. This differential file will take the user from their current version of the filter list to the next version. Crucially, within this differential update, the Diff-Path field will be updated to point to the subsequent version's diff. This ensures that the ad blocker knows where to find the next differential update.

Diff-Path also encodes additional information in the file name:

${patchName}[-${resolution}]-${epochTimestamp}-${expirationPeriod}.patch[#${resourceName}]

The following limitations are imposed on the Diff-Path:

If a list supports batch updates, the Diff-Path MUST also have a "hash" part, i.e. /path.patch#resourceName. This "hash" is the name of the resource to be patched. In this case, the ad blocker will only download the diff file once and then apply it to all lists that are specified in the diff file. See the Batch Updates section for more details.

Later in the document it will be referred as "resource name".

Expires continues to work as it was working before, i.e. once in a while the ad blocker will do the so-called "full sync". When differential updates are available it is recommended to increase the value of Expires to a large value, e.g. 10 days. This will ensure that the ad blocker will not do the full sync too often.

We propose using the RCS format for the diff files. This format is widely used in the software development industry and is well documented. It is also supported by the patch utility which is available on most operating systems.

In order to support batch updates and be able to validate patch result, the standard format is extended with the diff directive:

diff name:[name] checksum:[checksum] lines:[lines]

diff directive is optional. If it is not specified, the patch is applied without validation.

Note, that it is possible to extend the diff directive with additional fields not specified in the spec. The implementation should be able to ignore unknown fields.

It is recommended to use the diff checksum: directive to validate the patching result. This will ensure that the patch is applied correctly and the resulting file is not corrupted.

  1. Refer to the Diff-Path to see if a differential update is available.

  2. If the differential update is available, download and apply it to the current filter list.

  3. If the differential update is not available the server may signal about that by returning one of the following responses:

    In this case the ad blocker SHOULD wait for a while and then try again, see 2. Set Update Timer.

The update timer depends on the previous update check result.

  1. If the differential update was not empty and applied successfully, the ad blocker SHOULD check the new Diff-Path file expiration time.
  2. If the differential update was empty and the list's Diff-Path stayed the same, the ad blocker SHOULD delay the next update for at least 30 minutes to avoid overloading the server.

Any unexpected error during the update process SHOULD be treated as a fatal error and the ad blocker should wait until it is time for the full sync. Note, that it should respect the Expires value set by the filter list.

The mechanism allows having a single diff file for multiple filter lists. In order to achieve this, the resource name MUST be specified for each filter list that supports batch differential updates. The resource name is then used to match a filter list with its corresponding patch in the diff file. This is achieved by using the diff name: directive in the diff file which links a patch to a filter list.

Let's take an example:

Please find examples in the examples directory.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4