I'm working on a project which reads, parses and validates STAC files on AWS S3 recursively, creates a list of asset and metadata files, and copies all of them across to some reference storage. In other words, if a file is not mentioned in the metadata, it will not be copied.
The STAC documentation specifies that assets in collections have unique keys. Unfortunately, the STAC JSON schema can't encode this and the Python JSON parser does not treat duplicate keys as an error.
Combining the above means the following scenario is likely to happen (and has already happened during a pre-production test):
assets
property with two or more identical names, for example, "assets": {"foo": {"href": "s3://bucket/first"}, "foo": {"href": "s3://bucket/second"}}
.At this point, there is no indication that anything went wrong, but in fact the way the JSON parser works means it's silently dropped one of the assets. Detecting and reporting such issues is highly desirable, and can be done by setting the object_pairs_hook
in json.loads
(implementation, test).
We're evaluating whether to use pystac for this project, and this is one of the features of our current code base we would like to keep. Would you be interested in a PR to enable this? It would presumably be very similar to the read_text_method
mechanism you already have in STAC_IO
.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4