feat: add planetary-computer-multipart source, tests, and docs#610
Conversation
f0e9d89 to
ac78c5c
Compare
|
I understand you need something more to have the auto-discovery of the different parts of the data, so the current planetary-computer source is not enough. Thank you for sharing this code, this is likely to be helpfull for other and should be merged. It would be nice to keep only one source though, could you please refactor to include the multi part code in the same source and do the branching with an option? and Feel free to use the appropriate vocabulary, perhaps |
|
Hi Florian, thanks for taking a look! My thinking behind not addint it to the original source was because the multipart doesn't (can't) use the If I've got that wrong or you're happy with the above then I can definitely look into it. Thinking we could avoid a "mode" or similar config parameter by auto-detecting. That is, if the |
Sorry for insisting on this, I understand that the code may become more complex, but having a simpler interface for the user is more important imho. |
5f51db4 to
48e4362
Compare
|
Not a problem , makes sense. Pending CI checks I've consolidated the sources and updated docs and tests:
EDIT: in any case, the test I mentioned looks to make CI significantly slower. Happy to remove as it may be surplus anyway. |
|
Approved. This should be merged as soon as the branch is updated. @duncanmartyn , I will let you to update or rebase. As a follow up, perhaps another PR, I wonder if this could be extended/share code with a |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
95539b0 to
504a85c
Compare
|
@floriankrb thanks, rebased. Looks like it needs an ATS label if you're able to add that, please. On the shared / extended source, agreed. It's mainly a question of the differences between static and dynamic catalogues, of which Planetary Computer is the latter, but I'll look into it! |
🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.5.36](0.5.35...0.5.36) (2026-04-22) ### Features * Add CycleIntervalProvider and set_start_step_to_zero patch ([#564](#564)) ([2c8824c](2c8824c)) * Add planetary-computer-multipart source, tests, and docs ([#610](#610)) ([42117db](42117db)) * **create:** Add workaround for missing data at step zero ([#565](#565)) ([9fd4733](9fd4733)) * Fetch files from ecfs if path starts with ec: or ectmp: ([#585](#585)) ([9fb443a](9fb443a)) * Fix issue 569 ([#574](#574)) ([7f4e40a](7f4e40a)) * Fix typo with duplicates ([#580](#580)) ([f33333e](f33333e)) * Make anemoi-datasets agnostic to Zarr version (Optional support Zarr3) ([#220](#220)) ([ab8cd71](ab8cd71)) * Observations feature branch ([#480](#480)) ([92d5ac9](92d5ac9)) * Open datasets analytics ([#576](#576)) ([561dbd2](561dbd2)) * Remove https test ([#608](#608)) ([048e419](048e419)) ### Bug Fixes * **create:** Repeated-dates ([#572](#572)) ([b73d533](b73d533)) * Example accumulations section to user current accumulate API ([#601](#601)) ([9434007](9434007)) * Fix corner cases ([#594](#594)) ([bdd31ff](bdd31ff)) * Fix race condition during build ([#593](#593)) ([66e2070](66e2070)) * Fix read ahead while building ([#611](#611)) ([6d18e5e](6d18e5e)) * Fix weatherbench test ([#609](#609)) ([f434a15](f434a15)) * **grib-index:** Support querying float values ([#520](#520)) ([b089cd2](b089cd2)) * Improve MARS request handling for forecast datasets ([#562](#562)) ([f9efe39](f9efe39)) * Make dataset naming function public ([#579](#579)) ([b089bb0](b089bb0)) * Netcdf date/time metadata type should be int ([#555](#555)) ([9937fbe](9937fbe)) * Propagate resolution metadata when using anemoi_dataset source ([#614](#614)) ([784695c](784695c)) * Remove duplicate code ([#590](#590)) ([8e54420](8e54420)) * Remove empty accumulators from accumulation computation ([#561](#561)) ([3bc087d](3bc087d)) * Replace pydantic class Config with ConfigDict ([#592](#592)) ([ce6b2ff](ce6b2ff)) * Rolling average regression ([#587](#587)) ([04f5b0b](04f5b0b)) ### Documentation * Docs minor fixes update concat yaml ([#539](#539)) ([dd73fda](dd73fda)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. >⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)
Description
Adds a new source for multipart (multiple items and item assets) STAC collections on the open Microsoft Planetary Computer.
Design:
execute: matching date(s) passed to the method to asset URIs is necessary to mitigate datetime not found warnings.Sourceparent class rather thanXarraySourceBase: requires multiple URIs and, possibly but unlikely, different storage options per asset. Also requires date matching to avoid warnings, which the currentXarraySourceBase.executedoes not support.query.datetimeconfig key as a repetition of thedatessection: facilitates access to the dataset's datetime range in__init__, resulting in fewer STAC API queries. Querying inexecutemay result in as many API requests as there are timestamps in thedates.starttodates.endrange given iterative invocation of the method.planetary-computersource due to conceptual similarity and shared dependencies - happy to move to its own file if preferred.Changes:
coordinates.Coordinate.reducedandvariable.Variable.sel- skipiselon scalar coordinates (e.g. met-office-global-deterministic-near-surface datatimecoordinate).Two new dependencies are required by this change to handle remote NetCDF files with Xarray:
h5netcdfandh5py.What problem does this change solve?
The existing
planetary-computersource pertains only to STAC collections for which there is a collection-level dataset asset under thezarr-abfskey corresponding to a single Zarr store containing all data. This source enables the use of collections in which data are in separate files or stores and referenced in distinct items and assets thereof.What issue or task does this change relate to?
Additional notes
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.
📚 Documentation preview 📚: https://anemoi-datasets--610.org.readthedocs.build/en/610/