A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://rechunker.readthedocs.io/en/latest/algorithm.html below:

Algorithm — Rechunker 0.5.4.dev6+g2c197f4 documentation

Algorithm

The algorithm used by rechunker tries to satisfy several constraints simultaneously:

The algorithm we chose emerged via a lively discussion on the Pangeo Discourse Forum. We call it Push / Pull Consolidated.

Visualization of the Push / Pull Consolidated algorithm for a hypothetical 2D array. Each rectangle represents a single chunk. The dashed boxes indicate consolidate reads / writes.

A rough sketch of the algorithm is as follows

  1. User inputs a source array with a specific shape, chunk structure and data type. Also specifies target_chunks, the desired chunk structure of the output array and max_mem, the maximum amount of memory each worker is allowed to use.

  2. Determine the largest batch of data we can write by one worker given max_mem. These are the write_chunks.

  3. Determine the largest batch of data we can read by one worker given max_mem, plus the additional constraint of trying to fit within write chunks if possible. These are the read_chunks.

  4. If write_chunks == read chunks, we can avoid creating an intermediate dataset and copy the data directly from source to target.

  5. Otherwise, intermediate chunks are defined as the minimum of write_chunks and read_chunks along each axis. The source is copied first to the intermediate array and then from intermediate to target.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4