Skip to content

Commit 096d210

Browse files
authored
Adds stub of NarwhalsAdapter (#998)
The purpose of this adapter is to showcase how you can write transforms that are agnostic of the dataframe type. Assumptions for this plugin: * you can only have one "backend"; you can't mix & match. That means you can't load some in pandas, and some in polars I don't think -- this is a narwhals limitation. * This change uses the narwhals decorator. This assumes that non pandas/polars stuff would be left alone by it. If not, we could just skip adding it if we don't detect a type. * This makes the user choose what the return result builder is and then requires them to nest it in the narwhals result builder that just converts the outputs to the backend that is being used. * I think this is a good enough integration to get out -- we'll likely tweak/add more functionality as feedback comes in. Squashed commits: * Adds stub of NarwhalsAdapter Assumptions narwhals has (I believe): 1. you can only have one "backend"; you can't mix & match. That means you can't load some in pandas, and some in polars I don't think. 2. This change uses the narwhals decorator. This assumes that non pandas/polars stuff would be left alone by it. If not, we could just skip adding it if we don't detect a type. Otherwise probably need a better example from narhwals. * Adds one attempt at a result builder This makes the user choose what the return type is and then requires them to nest it in the narwhals result builder that just converts the outputs to the backend that is being used. * Adds narwhals plugin v1 First version of narwhals support. * Completes Narwhals example Adds README and notebook so that people can run this example easily. Also adds circleci tests. * Adds missing dependency * Fixes polars test for polars 1.0+ * Adds narwhals to integration docs
1 parent d12f4dc commit 096d210

14 files changed

Lines changed: 619 additions & 0 deletions

File tree

.ci/test.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,13 @@ if [[ ${TASK} == "vaex" ]]; then
5151
exit 0
5252
fi
5353

54+
if [[ ${TASK} == "narwhals" ]]; then
55+
pip install -e .
56+
pip install polars pandas narwhals
57+
pytest plugin_tests/h_narwhals
58+
exit 0
59+
fi
60+
5461
if [[ ${TASK} == "tests" ]]; then
5562
pip install .
5663
pytest \

.circleci/config.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,3 +155,21 @@ workflows:
155155
name: integrations-py312
156156
python-version: '3.12'
157157
task: integrations
158+
- test:
159+
requires:
160+
- check_for_changes
161+
name: narwhals-py39
162+
python-version: '3.9'
163+
task: narwhals
164+
- test:
165+
requires:
166+
- check_for_changes
167+
name: narwhals-py310
168+
python-version: '3.10'
169+
task: narwhals
170+
- test:
171+
requires:
172+
- check_for_changes
173+
name: narwhals-py311
174+
python-version: '3.11'
175+
task: narwhals

docs/integrations/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,4 @@ This section showcases how Hamilton integrates with popular frameworks.
2626
Slack <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/slack>
2727
Spark <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/spark>
2828
Vaex <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/vaex>
29+
Narwhals <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/narwhals>

examples/narwhals/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Narwhals
2+
3+
[Narwhals](https://narwhals-dev.github.io/narwhals/) is a library that aims
4+
to unify expression across dataframe libraries. It is meant to be lightweight
5+
and focuses on python first dataframe libraries.
6+
7+
This examples shows how you can write dataframe agnostic code
8+
and then load up a pandas or polars data to then use with it.
9+
10+
## Running the example
11+
12+
You can run the example doing:
13+
14+
```bash
15+
# cd examples/narwhals/
16+
python example.py
17+
```
18+
This will run both variants one after the other.
19+
20+
or running the notebook:
21+
22+
```bash
23+
# cd examples/narwhals
24+
jupyter notebook # pip install jupyter if you don't have it
25+
```
26+
Or you can open up the notebook in Colab:
27+
28+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/narwhals/notebook.ipynb)

examples/narwhals/example.png

34 KB
Loading

examples/narwhals/example.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
import narwhals as nw
2+
import pandas as pd
3+
import polars as pl
4+
5+
from hamilton.function_modifiers import config, tag
6+
7+
8+
@config.when(load="pandas")
9+
def df__pandas() -> nw.DataFrame:
10+
return pd.DataFrame({"a": [1, 1, 2, 2, 3], "b": [4, 5, 6, 7, 8]})
11+
12+
13+
@config.when(load="pandas")
14+
def series__pandas() -> nw.Series:
15+
return pd.Series([1, 3])
16+
17+
18+
@config.when(load="polars")
19+
def df__polars() -> nw.DataFrame:
20+
return pl.DataFrame({"a": [1, 1, 2, 2, 3], "b": [4, 5, 6, 7, 8]})
21+
22+
23+
@config.when(load="polars")
24+
def series__polars() -> nw.Series:
25+
return pl.Series([1, 3])
26+
27+
28+
@tag(nw_kwargs=["eager_only"])
29+
def example1(df: nw.DataFrame, series: nw.Series, col_name: str) -> int:
30+
return df.filter(nw.col(col_name).is_in(series.to_numpy())).shape[0]
31+
32+
33+
def group_by_mean(df: nw.DataFrame) -> nw.DataFrame:
34+
return df.group_by("a").agg(nw.col("b").mean()).sort("a")
35+
36+
37+
if __name__ == "__main__":
38+
import __main__ as example
39+
40+
from hamilton import base, driver
41+
from hamilton.plugins import h_narwhals, h_polars
42+
43+
# pandas
44+
dr = (
45+
driver.Builder()
46+
.with_config({"load": "pandas"})
47+
.with_modules(example)
48+
.with_adapters(
49+
h_narwhals.NarwhalsAdapter(),
50+
h_narwhals.NarwhalsDataFrameResultBuilder(base.PandasDataFrameResult()),
51+
)
52+
.build()
53+
)
54+
r = dr.execute([example.group_by_mean, example.example1], inputs={"col_name": "a"})
55+
print(r)
56+
57+
# polars
58+
dr = (
59+
driver.Builder()
60+
.with_config({"load": "polars"})
61+
.with_modules(example)
62+
.with_adapters(
63+
h_narwhals.NarwhalsAdapter(),
64+
h_narwhals.NarwhalsDataFrameResultBuilder(h_polars.PolarsDataFrameResult()),
65+
)
66+
.build()
67+
)
68+
r = dr.execute([example.group_by_mean, example.example1], inputs={"col_name": "a"})
69+
print(r)
70+
dr.display_all_functions("example.png")

0 commit comments

Comments
 (0)