Skip to content

test(create): refactor and parametrize class-based dataset tests#597

Merged
anaprietonem merged 5 commits intoecmwf:mainfrom
jfdev001:jfrazier/parametrize-test-classes
Apr 15, 2026
Merged

test(create): refactor and parametrize class-based dataset tests#597
anaprietonem merged 5 commits intoecmwf:mainfrom
jfdev001:jfrazier/parametrize-test-classes

Conversation

@jfdev001
Copy link
Copy Markdown
Contributor

@jfdev001 jfdev001 commented Mar 27, 2026

Description

Simplifies and parametrizes redundant testing code in /tests/create/test_classes.py

Follows parametrization conventions of existing tests:

$ grep -lR "parametrize" tests/ --exclude-dir=__pycache__
tests/test_dates.py
tests/test_window_view.py
tests/xarray/test_flavour.py
tests/create/test_create.py
tests/create/test_covering_intervals.py
tests/create/test_statistics.py
tests/create/test_sources.py

What problem does this change solve?

Improve extensibility of class-based dataset tests.

Additional notes

To verify this doesn't remove any existing tests, please see #597 (comment)

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@jfdev001
Copy link
Copy Markdown
Contributor Author

jfdev001 commented Mar 27, 2026

In order to verify no tests are lost, please do the following:

$ git clone https://github.com/ecmwf/anemoi-datasets.git
$ cd anemoi-datasets
$ git checkout -b jfrazier/parametrize-test-classes 9ba8096041690a2543f40f434674bfb1096cad95
$ git worktree add worktree_main 8e54420c20af9554121ae03fc556e7022864b8f7
$ python -m venv .venv
$ uv pip install .[all,tests] 
$ . .venv/bin/activate 

Then run the below script (if you get an assert eq with no output, then delete /tmp/pytest_* since some of the tests rely on an internet connection and might fail due to this, then re-run the script):

#!/usr/bin/bash
# @brief Extract passed/skipped tests from pytest of main and pytest of PR to verify no tests lost
set -eu

# Run pytest on main 
pytest_main_log=/tmp/pytest_main.log
[[ ! -f ${pytest_main_log} ]] && pytest -v worktree_main/tests/create/test_classes.py | tee ${pytest_main_log} 
pytest_main_PASSED=($(grep "PASSED" ${pytest_main_log} | grep -o "gridd.*" | sort -u))
pytest_main_SKIPPED=($(grep "SKIPPED" ${pytest_main_log} | grep -o "gridd.*" | sort -u))

# Run pytest on PR 
pytest_pr_log=/tmp/pytest_pr.log
[[ ! -f ${pytest_pr_log} ]] && pytest -v tests/create/test_classes.py | tee ${pytest_pr_log}
pytest_pr_PASSED=($(grep "PASSED" ${pytest_pr_log} | grep -o "gridd.*" | sort -u))
pytest_pr_SKIPPED=($(grep "SKIPPED" ${pytest_pr_log} | grep -o "gridded.*" | sort -u))

# Print PASSED side by side
n_pr_PASSED="${#pytest_pr_PASSED[@]}"
n_main_PASSED="${#pytest_main_PASSED[@]}"
[[ ! ${n_pr_PASSED} -eq ${n_main_PASSED} ]] && echo "assert eq" && exit 1 
echo "----------"
echo "  PASSED"
echo "main ----> PR"
echo "----------"
for (( i = 0; i < n_pr_PASSED; i++))
do
    echo "${pytest_main_PASSED[$i]} ---> ${pytest_pr_PASSED[$i]}" 
done
echo 

# Print SKIPPED side by side
n_pr_SKIPPED="${#pytest_pr_SKIPPED[@]}"
n_main_SKIPPED="${#pytest_main_SKIPPED[@]}"
[[ ! ${n_pr_SKIPPED} -eq ${n_main_SKIPPED} ]] && echo "assert eq" && exit 1 
echo "-----------"
echo "  SKIPPED"
echo "main -----> PR"
echo "-----------"
for (( i = 0; i < n_pr_SKIPPED; i++))
do
    echo "${pytest_main_SKIPPED[$i]} ---> ${pytest_pr_SKIPPED[$i]}" 
done

which outputs the names of original tests on main mapped to the name of the test auto-generated by parametrize. Due to the sorting, the select_.* tests are output in the wrong order, but the functionality is the same. The main gridded_select_drop of course maps to the PR gridded_select_drop, so ignore the sorting there:

----------
  PASSED
main ----> PR
----------
gridded_complement_nearest_1 ---> gridded_complement[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-nearest-variables1]
gridded_complement_nearest_2 ---> gridded_complement[cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-nearest-variables0]
gridded_concat ---> gridded_concat
gridded_cropping ---> gridded_cropping
gridded_ensemble ---> gridded_ensemble
gridded_grids ---> gridded_grids
gridded_join_1 ---> gridded_join[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-pl-aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-sfc-variables1]
gridded_join_2 ---> gridded_join[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-sfc-aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-pl-variables0]
gridded_number ---> gridded_number
gridded_rename ---> gridded_rename
gridded_rescale_1 ---> gridded_rescale[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-rescale0-None]
gridded_rescale_2 ---> gridded_rescale[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-rescale1-None]
gridded_rescale_3 ---> gridded_rescale[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-rescale2-cfunits]
gridded_select_drop ---> gridded_select[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-select0]
gridded_select_select_1 ---> gridded_select[aifs-ea-an-oper-0001-mars-20p0-2017-2017-6h-v1-select1]
gridded_select_select_2 ---> gridded_select_drop
gridded_subset ---> gridded_subset
gridded_thinning_1 ---> gridded_thinning[cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-None-4]
gridded_zarr ---> gridded_zarr

-----------
  SKIPPED
main -----> PR
-----------
gridded_chain ---> gridded_chain
gridded_complement_none ---> gridded_complement_none
gridded_cutout ---> gridded_cutout
gridded_interpolate_frequency ---> gridded_interpolate_frequency
gridded_interpolate_nearest ---> gridded_interpolate_nearest
gridded_merge ---> gridded_merge
gridded_missing_dataset ---> gridded_missing_dataset
gridded_missing_date_error ---> gridded_missing_date_error
gridded_missing_dates ---> gridded_missing_dates
gridded_missing_dates_closest ---> gridded_missing_dates_closest
gridded_missing_dates_fill ---> gridded_missing_dates_fill
gridded_missing_dates_interpolate ---> gridded_missing_dates_interpolate
gridded_padded ---> gridded_padded
gridded_rename_with_overlap ---> gridded_rename_with_overlap
gridded_skip_missing_dates ---> gridded_skip_missing_dates
gridded_statistics ---> gridded_statistics
gridded_thinning_2 ---> gridded_thinning[cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-distance-based-100]
gridded_thinning_3 ---> gridded_thinning[cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-grid-100]
gridded_thinning_4 ---> gridded_thinning[cerra-rr-an-oper-0001-mars-5p0-2017-2017-6h-v1-random-0.5]
gridded_trim_edge ---> gridded_trim_edge
gridded_xy ---> gridded_xy
gridded_zarr_with_missing_dates ---> gridded_zarr_with_missing_dates
gridded_zip ---> gridded_zip
gridded_zipbase ---> gridded_zipbase

The script does pytest twice, however you could simply put the logs below into the /tmp/ directory in order to skip that step:

pytest_main.log

pytest_pr.log

@jfdev001 jfdev001 force-pushed the jfrazier/parametrize-test-classes branch from 64ebf0b to 8b90e43 Compare March 27, 2026 22:11
@NatalieZelenka NatalieZelenka moved this from To be triaged to Reviewers needed in Anemoi-dev Apr 8, 2026
anaprietonem
anaprietonem previously approved these changes Apr 14, 2026
@github-project-automation github-project-automation Bot moved this from Reviewers needed to For merging in Anemoi-dev Apr 14, 2026
@jfdev001
Copy link
Copy Markdown
Contributor Author

jfdev001 commented Apr 14, 2026

I think I see why this is failing in the CI. Tests pass locally for me as well. Maybe related to this?

platform linux -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/jf01/dev/anemoi-datasets
configfile: pyproject.toml
plugins: skip-slow-0.0.5, xdist-3.8.0
16 workers [5 items]      
.....                                                                                                                                                                                                        [100%]
================================================================================================= warnings summary =================================================================================================
tests/xarray/test_zarr.py::test_noaa_replay
  /home/jf01/dev/anemoi-datasets/tests/xarray/test_zarr.py:115: FutureWarning: In a future version, xarray will not decode the variable 'ftime' into a timedelta64 dtype based on the presence of a timedelta-like 'units' attribute by default. Instead it will rely on the presence of a timedelta64 'dtype' attribute, which is now xarray's default way of encoding timedelta64 values.
  To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
  To opt-in to future behavior, set `decode_timedelta=False`.
    ds = xr.open_zarr(

tests/xarray/test_zarr.py::test_weatherbench
  /home/jf01/dev/anemoi-datasets/tests/xarray/test_zarr.py:63: FutureWarning: In a future version, xarray will not decode the variable 'prediction_timedelta' into a timedelta64 dtype based on the presence of a timedelta-like 'units' attribute by default. Instead it will rely on the presence of a timedelta64 'dtype' attribute, which is now xarray's default way of encoding timedelta64 values.
  To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
  To opt-in to future behavior, set `decode_timedelta=False`.
    ds = xr.open_zarr(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== 5 passed, 2 warnings in 10.11s ==========================================================================================
[jf01@lapfrazier] xarray$ uv pip show xarray
Using Python 3.12.4 environment at: /home/jf01/dev/anemoi-datasets/.venv
Name: xarray
Version: 2026.2.0
Location: /home/jf01/dev/anemoi-datasets/.venv/lib/python3.12/site-packages
Requires: numpy, packaging, pandas
Required-by: earthkit-data, zarrdump

i.e.,

To continue decoding into a timedelta64 dtype, either set decode_timedelta=True

@jfdev001
Copy link
Copy Markdown
Contributor Author

I guess it was an unrelated problem, see #609

@anaprietonem anaprietonem self-requested a review April 15, 2026 05:30
@anaprietonem anaprietonem merged commit 9ba8096 into ecmwf:main Apr 15, 2026
12 checks passed
@github-project-automation github-project-automation Bot moved this from For merging to Done in Anemoi-dev Apr 15, 2026
@jfdev001 jfdev001 deleted the jfrazier/parametrize-test-classes branch April 15, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants