⭐ added various functions to the binning module#567
⭐ added various functions to the binning module#567KevinShuman merged 96 commits intoCloud-Drift:mainfrom
Conversation
Co-authored-by: Copilot <[email protected]>
|
What is the expected output for the test above? Running it, I am getting the following output/error: The following tests are also failing because of this: |
|
Sorry, I didn't realize this. Will try fixing right now. |
There was a problem hiding this comment.
Pull Request Overview
This PR extends the binning module functionality to support various statistical computations beyond just counting and mean calculations. The changes enable automatic handling of datetime coordinates/variables and add comprehensive testing for the new features.
- Extended statistics support to include count, sum, mean, median, std, min, max, and custom functions
- Added automatic datetime handling in coordinates and data variables
- Enhanced naming conventions and collision avoidance for output variables
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/binning_test.py | Adds comprehensive tests for new statistics functions, datetime handling, and error cases |
| clouddrift/binning.py | Implements new statistics functions, datetime conversion utilities, and enhanced parameter validation |
| .github/workflows/ci.yml | Updates CI workflow conditions for coverage reporting |
Co-authored-by: Copilot <[email protected]>
clouddrift/binning.py
Outdated
| - A list of ints or arrays: one per dimension, specifying either bin count or bin edges, | ||
| - None: defaults to 10 bins per dimension. | ||
| bins_range : list of tuples, optional | ||
| Outer bin limits for each dimension. |
There was a problem hiding this comment.
Is it possible to provide a range for only a subset of the dimensions or do we have to provide the ranges for all the dimensions when this optional argument is given?
There was a problem hiding this comment.
You can provide range for a subset of the variables. I need to add tests to this actually. If you do this: [[-90, 90], None], it should apply the range only on the first variable and take min/max for the second one.
clouddrift/binning.py
Outdated
| - a tuple of (output_name, callable) for multivariate statistics. 'output_name' is used to identify the resulting variable. | ||
| In this case, the callable will receive the list of arrays provided in `data`. For example, to calculate kinetic energy, | ||
| you can pass `data = [u, v]` and `statistics=("ke", lambda data: np.sqrt(np.mean(data[0] ** 2 + data[1] ** 2)))`. | ||
| - a list containing any combination of the above, e.g., ['mean', np.nanmax, ('ke', lambda data: np.sqrt(np.mean(data[0] ** 2 + data[1] ** 2)))]. |
There was a problem hiding this comment.
Here you suggest np.nanmax because the standard 'max' statistic will return np.nan if the data contain np.nan values?
There was a problem hiding this comment.
To be fair, I didn't think about this. I just included a random numpy function as an example.
clouddrift/binning.py
Outdated
| + statistics_func | ||
| ) | ||
|
|
||
| if statistics and not data.size: |
There was a problem hiding this comment.
Mypy seems to complain about the .size attribute of data?
There was a problem hiding this comment.
does it? I don't see this in the tests below?
|
Hi @selipot, In the end I don't think anything is wrong, just maybe confusing. If you use the but here if you want to apply the same function to all variables, you just pass a because it is an anonymous function, the variable names will be Let me know if you have suggestions on how to make this clearer. |
|
Ok, in the second option you propose, I do not think the Callable should be I don't want to ask tooooo much but I think the output I originally expected would be nicer :) or makes more sense, i.e. get |
|
Yes, it's the first element of the The thing is the syntax with a In your case, if you want to apply a function to all variables, you can pass variables here are set to |
|
this is good to go @KevinShuman |

Created this since I have worked on extending the binning module (closes #565).
think about how we can abstract the "histogram" part so we could use other methods like polynomial fitting(another PR)