This issue describes a concept for zarr v3 protocol extension which enables content-addressable storage to be layered on top of any underlying store. It is a thought experiment only, not a concrete proposal. Elaboration of suggestions at #76.
Goals:
The protocol extension introduces a layer of indirection to the storage protocol. This can be thought of as a transformation layer which sits above the store and modifies the key/value store operations.
When attempting to store a given (key, value) pair, the storage transformer hashes the value, and the hash is used to obtain a content-addressable key.
E.g., if storing an encoded chunk value under key 'data/foo/bar/0.0', if the hash for the value is 'abcdefghijklmnopqrstuvwxyz', then the content-addressable key would be something like 'content/a/b/c/d/efghijklmnopqrstuvwxyz'. The way that the transformer generates content-addressable keys could be configured with depth, width and hash algorithm.
The transformer would then issue a request to the underlying store to set (content-addressable-key, value).
To keep track of the location of the content, a content metadata document would be created, which records the content-addressable key, together with any other metadata such as timestamp of creation. This could be a JSON document, and could record multiple versions of the content. E.g.:
[ { "address": "content/a/b/c/d/efghijklmnopqrstuvwxyz", "timestamp": 1593171939 }, { "address": "content/z/yxwvutsrqponmlkjihgfedcba", "timestamp": 32503680000 }, ... ]
This JSON document could be stored under the original key, in this case 'data/foo/bar/0.0'.
In other words, when the transformer receives a request set(key, value), it does the following:
When the transformation layer receives a request to get(key, value), it does the following:
The transformation layer could also expose an API to set the time state for reading. E.g., if time state is set to time T, then when the transformation layer receives a request to get(key, value), it does the following:
In order to discover that content-addressable storage transformer extension is in use, this could be declared in the zarr entry point metadata document, e.g., zarr.json like:
{ "zarr_format": "https://purl.org/zarr/spec/protocol/core/3.0", "metadata_encoding": "application/json", "extensions": [ { "extension": "http://example.org/zarr/extension/content-addressable-storage-transformer", "must_understand": true, "configuration": { "algorithm": "sha256", "depth": 4, "width": 1 } } ] }
When a zarr v3 implementation opened a hierarchy using this extension, it could recognise that when parsing the entry point metadata document, and insert the appropriate store transformer if supported. I.e., a user opening such a hierarchy would not need to know that content-addressable storage was used, the implementation would discover that for itself.
There are several potential advantages of this scheme:
The group and array metadata keys would exist in the store as normal, and so any functionality inspecting the keys to infer which groups and arrays are present in the hierarchy would still work unchanged. I.e., the transformation layer could just pass through the list, list_pre and list_dir operations to the underlying store and everything should work as normal.
The metadata that tracks the content locations would be decentralised, allowing all of the same degrees of parallelism that the normal store provides. I.e., chunks could be written in parallel, and arrays could be created in parallel, without requiring any locking or synchronisation.
Because the extension is a transformation layer, any type of underlying store could be used. It would also be fine to migrate data between different types of storage, e.g., create data on a local filesystem, then copy up to object storage.
This extension does provide a slightly stronger guarantee against the underlying store getting corrupted by partially-successful writes, because the content metadata documents would only ever get updated after a successful content write operation.
Notes:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4