A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://speakerdeck.com/tomnicholas/virtualizarr-create-virtual-zarr-stores-using-xarray-syntax below:

VirtualiZarr: Create virtual Zarr stores using xarray syntax

  • Problem of accessing legacy file formats - All these old

    files (netCDF, HDF5, GRIB, …) - Want to put them on the cloud - Zarr is a much better format - Separation between data and metadata - Scalable - Cloud-optimized access patterns - But data providers don't want to change formats - Ideally avoid data duplication

  • Kerchunk + fsspec approach - Makes a zarr-like layer -

    Kerchunk currently: - Finds byte ranges using e.g. SingleHdf5ToZarr - Represents them as a nest dict in-memory - Combines dicts using MultiZarrToZarr - Writes them out in kerchunk reference format (json/parquet) - Result is sidecar files which behave like a zarr store… - … when read through fsspec

  • Issues with Kerchunk + fsspec approach - Concatenation is complicated

    - Store-level abstractions make many operations hard to express - MultiZarrToZarr is bespoke and overloaded - In-memory dict representation is complicated, bespoke, and inefficient - Output files are not true Zarr stores, - Can only be understood by fsspec (i.e. currently only in python…?)

  • Future of Zarr: Chunk manifests ZEP - Formalize via a

    ZEP, then implement reading arbitrary byte ranges in zarr readers - Means virtual zarr stores that can be read in any language - Opens the door to e.g. javascript visualization frameworks pointing at netCDF files… - New type of Zarr store containing chunk manifest.json files

  • Conclusion - Virtual Zarr stores over legacy data are a

    cool idea! - VirtualiZarr package exists as alternative to kerchunk - Some rough edges but progressing quickly - Can be used today to write kerchunk-format references - Uses xarray API so should be intuitive - Plan is to upstream sidecar formats as Zarr enhancements Go try it! https://github.com/TomNicholas/VirtualiZarr


  • RetroSearch is an open source project built by @garambo | Open a GitHub Issue

    Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

    HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4