MultiZarrToZarr
is extremely powerful but rather hard to use.
This is important - kerchunk has been transformative, so we increasingly recommend it as the best way to ingest large amounts of data into the pangeo ecosystem's tools. However that means we should make sure the kerchunk user experience is smooth, so that new users don't get stuck early on.
Part of the problem is that this one MultiZarrToZarr
function can do many different things. Contrast with xarray - when combining multiple datasets into one, xarray takes some care to distinguish between a few common cases/concepts (we even have a glossary):
xr.concat
where dim
is a strxr.concat
where dim
is a set of valuesxr.merge
xr.combine_nested
xr.combine_by_coords
In kerchunk it seems that the recommended way to handle operations resembling all 5 of these cases is through MultiZarrToZarr
. It also cannot currently easily handle certain types of multi-dimensional concatenation.
Break up MultiZarrToZarr
by defining a set of functions similar to xarray's merge
/concat
/combine
/unify_chunks
that consume and produce VirtualZarrStore
objects (EDIT: see #375).
coo_map
kwarg (it has 10 possible input types!). Perhaps giving simply an ordered list of coordinate values would be sufficient, and just make it easier for the user to extract the values they want from the VirtualZarrStore
objects they want to concatenate.kerchunk.combine.auto_dask
does).merge
/concat
/combine
? And what can we learn from the design decisions in pangeo-forge-recipes FilePattern
? (@cisaacstern @rabernat )combine.merge_vars
and combine.concatenate_arrays
functions to providing this functionality? If the answer is "pretty close", then how much of this issue could be solved via documentation?cisaacstern, ivirshup and abkfenriscisaacstern, abkfenris, sharkinsspatial, ahuang11, moriahc and 1 more
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4