feat: Add Zarr v2 metadata access and basic auth support for OIDC-protected assets#2112
feat: Add Zarr v2 metadata access and basic auth support for OIDC-protected assets#2112
Conversation
Code Coverage (Ubuntu)DetailsDiff against developResults for commit: ca62273 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
Code Coverage (Windows)DetailsDiff against developResults for commit: ca62273 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
|
@cauriol, instead of introducing a new method @sbrunato can we get your opinion here ? |
@alambare I agree. |
| def request_asset( | ||
| self, | ||
| url: str, | ||
| ) -> requests.Response: | ||
| """Perform a GET request to the given URL using product's authentication headers.""" | ||
| headers = self.get_storage_options().get("headers", {}) | ||
| return requests.get(url, headers=headers, stream=True) |
There was a problem hiding this comment.
we should use the stream download method from EODAG download plugins instead of creating a new method.
| ) | ||
|
|
||
| if ".zmetadata" in mapper: | ||
| meta = json.loads(mapper[".zmetadata"]) |
There was a problem hiding this comment.
Use orjson instead of json.
| # TODO: Support Zarr v3 when test data becomes available. | ||
| # Zarr v2 uses `.zmetadata`, while Zarr v3 exposes `zarr.json`. | ||
| # The implementation should be straightforward once we can validate it | ||
| # against real examples. | ||
| raise ValueError(f"No Zarr metadata file found at {base_url}") |
There was a problem hiding this comment.
With Destination Earth, we do have a zarr v3 store: DanubeHis data. I don't understand what is blocking here.
| headers = self.get_storage_options().get("headers", {}) | ||
| return requests.get(url, headers=headers, stream=True) | ||
|
|
||
| def list_zarr_files_from_metadata( |
There was a problem hiding this comment.
In zarr, we have keys and not files. I suppose you can rename the method to something like list_zarr_keys ?
| try: | ||
| url = self.assets[asset_key]["href"] if asset_key else self.location | ||
| except KeyError as e: | ||
| raise DatasetCreationError(f"{asset_key} not found in {self} assets") from e |
There was a problem hiding this comment.
You are not creating a xarray here. The exception class is not accurate.
| "zipstream-ng", | ||
| "fsspec", | ||
| "aiohttp", | ||
| "requests" |
There was a problem hiding this comment.
requests is already a dependency in EODAG.
| mapper = fsspec.get_mapper( | ||
| base_url, | ||
| client_kwargs={"headers": headers, "trust_env": False}, | ||
| ) |
There was a problem hiding this comment.
I am not sure about introducing fsspec in EODAG library.
The library is a wrapper above multiple backends. To use it, you need to bring backends to support all the filesystems: s3fs for S3, gcsfs for google cloud storage and adlfs for azure blob storage.
Most of the times, users will store zarr files on a S3 compatible store but some may store them on google or azure stores.
Maybe we don't want to support google / azure stores ? or in an optional dependencies branch ?
My opinion:
- we do not support google and azure yet. Maybe once if we replace
boto3byobstore. - we do not bring
fsspecin EODAG but instead callboto3to get the list of zarr keys. If a dedicated function is needed to get the list of keys with boto3, it should live ineodag/utils/s3.py.
@sbrunato what do you think?
This PR adds support for accessing Zarr v2 assets from EOProduct and extends OIDC authentication handling to support Basic authorization for protected asset access.
Changes:
-add helper methods in EOProduct to:
- extract auth headers from auth objects
- request protected assets with forwarded authentication headers
- list files from Zarr v2 metadata using .zmetadata
Notes
this PR currently targets Zarr v2 only through .zmetadata
Zarr v3 support can be added later if needed