feat: fetch multiple chunks in parallel when reading from s3#617
Conversation
overrides Zarr's default sequential getitems() for significantly higher throughput when reading from an S3 bucket
HCookie
left a comment
There was a problem hiding this comment.
Thanks for your interest in the project.
Looks good to me, just one comment on the changelog, as we don't manually set that.
Not idea why I cannot merge this, I don't see any review from HCookie, let's dismis the review ot try merging.
|
@ronandarcy I updated your branch, approved, and merged this. Some contributors may not like when somebody else is updating their branch because it's a commit in their own repo. If this is a problem for you, don't hesitate to say so. |
No problem at all, thanks for merging |
overrides Zarr's default sequential getitems() for significantly higher throughput when reading from an S3 bucket
Description
A companion change to ecmwf/anemoi-utils#289 to make use of the
get_objects_parallelWhat problem does this change solve?
Significantly increases training speed when reading from an S3 bucket.
When training using the O48 dataset hosted at EWC, a speedup from 0.31 it/s to 0.79 it/s was seen representing an increase of over 150%.
What issue or task does this change relate to?
Additional notes
ecmwf/anemoi-utils#289 is required for this change to work.
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.