Thank you for your interest in contributing to Narwhals! Any kind of improvement is welcome!
If you've got experience with open source contributions, the following instructions might suffice:
- clone repo:
git clone git@github.com:narwhals-dev/narwhals.git narwhals-dev cd narwhals-dev/git remote rename origin upstreamgit remote add origin <your fork goes here>uv venv -p 3.12. .venv/bin/activateuv pip install -U -e . --group local-dev -e test-plugin- To run tests:
pytest - To run all linting checks:
pre-commit run --all-files - To run static typing checks:
make typing
For more detailed and beginner-friendly instructions, see below!
You can contribute to Narwhals in your local development environment, using python3, git and your editor of choice. You can also contribute to Narwhals using Github Codespaces - a development environment that's hosted in the cloud. This way you can easily start to work from your browser without installing git and cloning the repo. Scroll down for instructions on how to use Codespaces.
Open your terminal and run the following command:
git --versionIf the output looks like git version 2.34.1 and you have a personal account on GitHub - you're good to go to the next step.
If the terminal output informs about command not found you need to install git.
If you're new to GitHub, you'll need to create an account on GitHub.com and verify your email address.
You should also check for existing SSH keys and generate and add a new SSH key if you don't have one already.
Go to the main project page. Fork the repository by clicking on the fork button. You can find it in the right corner on the top of the page.
Go to the forked repository on your GitHub account - you'll find it on your account in the tab Repositories.
Click on the green Code button and then click the Copy url to clipboard icon.
Open a terminal, choose the directory where you would like to have Narwhals repository and run the following git command:
git clone <url you just copied>for example:
git clone git@github.com:YOUR-GITHUB-USERNAME/narwhals.git narwhals-devYou should then navigate to the folder you just created:
cd narwhals-devgit remote add upstream git@github.com:narwhals-dev/narwhals.git
git fetch upstreamCheck to see the remote has been added with git remote -v, you should see something like this:
git remote -v
origin git@github.com:YOUR-GITHUB-USERNAME/narwhals.git (fetch)
origin git@github.com:YOUR-GITHUB-USERNAME/narwhals.git (push)
upstream git@github.com:narwhals-dev/narwhals.git (fetch)
upstream git@github.com:narwhals-dev/narwhals.git (push)where YOUR-GITHUB-USERNAME will be your GitHub user name.
Here's how you can set up your local development environment to contribute.
If you want to run PySpark-related tests, you'll need to have Java installed. Refer to the Spark documentation for more information.
-
Make sure you have Python3.12 installed, create a virtual environment, and activate it. If you're new to this, here's one way that we recommend:
-
Install uv (see uv getting started) or make sure it is up-to-date with:
uv self update -
Install Python3.12:
uv python install 3.12 -
Create a virtual environment:
uv venv -p 3.12 --seed -
Activate it. On Linux, this is
. .venv/bin/activate, on Windows.\.venv\Scripts\activate.
-
-
Install Narwhals:
uv pip install -e . --group local-dev -e test-plugin. This will include fast-ish core libraries and dev dependencies. If you also want to test other libraries like Dask , PySpark, and Modin, you can install them too withuv pip install -e ".[dask, pyspark, modin]" --group local-dev.
The pre-commit tool is installed as part of the local-dev dependency group. This will automatically format and lint your code before each commit, and it will block the commit if any issues are found.
Static typing is run separately from pre-commit, as it's quite slow. Assuming you followed all the instructions above, you can run it with make typing.
- Make sure you have Python 3.8+ installed. If you don't, you can check install Python to learn how. Then, create and activate a virtual environment.
- Then, follow steps 2-4 from above but using
pip installinstead ofuv pip install.
Create a new git branch from the main branch in your local repository.
Note that your work cannot be merged if the test below fail.
If you add code that should be tested, please add tests.
- To run tests, run
pytest. To check coverage:pytest --cov=narwhals - To run tests on the doctests, use
pytest narwhals --doctest-modules - To run unit tests and doctests at the same time, run
pytest tests narwhals --cov=narwhals --doctest-modules - To run tests multiprocessed, you may also want to use pytest-xdist (optional)
- To choose which backends to run tests with you, you can use the
--constructorsflag:- To only run tests for pandas, Polars, and PyArrow, use
pytest --constructors=pandas,pyarrow,polars - To run tests for all CPU constructors, use
pytest --all-cpu-constructors - By default, tests run for pandas, pandas (PyArrow dtypes), PyArrow, and Polars.
- To run tests using
cudf.pandas, runNARWHALS_DEFAULT_CONSTRUCTORS=pandas python -m cudf.pandas -m pytest - To run tests using
polars[gpu], runNARWHALS_POLARS_GPU=1 pytest --constructors=polars[lazy]
- To only run tests for pandas, Polars, and PyArrow, use
In general we assume that dataframes are used to store and process columnar data. Therefore:
- Iterating over rows in Python is never allowed. Assume that there's an infinite number of rows.
- Iterating over columns is acceptable (though native APIs that do the iteration in a low-level language are preferred if possible!).
-
pandas:
-
Don't use
applyormap. The only place we currently useapplyis ingroup_byfor operations which the pandas API just doesn't support, and even then, it's accompanied by a big warning. -
Don't use inplace methods, unless you're creating a new object and are sure it's safe to modify it. In particular, you should never ever modify the user's input data.
-
Please remember that
assign,drop,reset_index, andrename, though seemingly harmless, make full copies of the input data.- Instead of
assign, prefer usingwith_columnsat the compliant level. - For
dropandreset_index, useinplace=True, so long as you're only modifying a new object which you created. Again, you should never modify user input. This may need updating if/when it's deprecated/removed, but please keep it for older pandas versions https://github.com/pandas-dev/pandas/pull/51466/files. - Instead of
rename, preferaliasat the compliant level.
- Instead of
-
pandas supports any hashable object as a column name, whereas other libraries tend to only support strings. We tend to just type
: strin places which accept column names, with the understanding that for pandas, other data types will silently work.
-
-
Polars:
- Never use
map_elements.
- Never use
-
DuckDB / PySpark / anything lazy-only:
- Never assume that your data is ordered in any pre-defined way.
- Never materialise your data (only exception:
collect). - Avoid calling the schema / column names unnecessarily.
-
DuckDB:
In addition to the above:
- Use the Python API as much as possible, only falling
back to SQL as the last resort for operations not yet supported
in their Python API (e.g.
over). - Use standard SQL constructs where possible instead of non-standard
ones such as
GROUP BY ALLorEXCLUDE.
- Use the Python API as much as possible, only falling
back to SQL as the last resort for operations not yet supported
in their Python API (e.g.
We aim to use three standard patterns for handling test failures:
Note: While we're not currently totally consistent with these patterns, any efforts towards our aim are appreciated and welcome.
-
requests.applymarker(pytest.mark.xfail): Used for features that are planned but not yet supported.def test_future_feature(request): request.applymarker(pytest.mark.xfail) # Test implementation for planned feature
-
pytest.mark.skipif: Used when there's a condition under which the test cannot run (e.g., unsupported pandas versions).@pytest.mark.skipif(PANDAS_VERSION < (2, 0), reason="requires pandas 2.0+") def test_version_dependent(): # Test implementation
-
pytest.raises: Used for testing that code raises expected exceptions.def test_invalid_input(): with pytest.raises(ValueError, match="expected error message"): # Code that should raise the error
Document clear reasons in test comments for any skip/xfail patterns to help maintain test clarity.
If you want to have less surprises when opening a PR, you can take advantage of nox to run the entire CI/CD test suite locally in your operating system.
To do so, you will first need to install nox and then run the nox command in the root of the repository:
python -m pip install nox # python -m pip install "nox[uv]"
noxNotice that nox will also require to have all the python versions that are defined in the noxfile.py installed in your system.
We use Hypothesis to generate some random tests, to check for robustness.
To keep local test suite times down, not all of these run by default - you can
run them by passing the --runslow flag to PyTest.
To keep local development test times down, Dask and Modin are excluded from dev dependencies, and their tests only run in CI. If you install them with
uv pip install -U dask[dataframe] modin
then their tests will run too.
We can't currently test in CI against cuDF, but you can test it manually in Kaggle using GPUs. Please follow this Kaggle notebook to run the tests.
We run both mypy and pyright in CI. Both of these tools are included when installing Narwhals with the local-dev dependency group.
Run them with:
make typingto verify type completeness / correctness.
Note that:
- In
_pandas_like, we type all native objects as if they are pandas ones, though in reality this package is shared between pandas, Modin, and cuDF. - In
_spark_like, we type all native objects as if they are SQLFrame ones, though in reality this package is shared between SQLFrame and PySpark.
If you are adding a new feature or changing an existing one, you should also update the documentation and the docstrings to reflect the changes.
Writing the docstring in Narwhals is not an exact science, but we have some high level guidelines (if in doubt just ask us in the PR):
- The examples should be clear and to the point.
- The examples should import one dataframe library, create a dataframe and exemplify the Narwhals functionality.
- We strive for balancing the use of different backend across all our docstrings examples.
- There are exceptions to the above rules!
Here an example of a docstring:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.estimated_size()
32Full discussion at narwhals#1943.
To build the docs, run mkdocs serve, and then open the link provided in a browser.
The docs should refresh when you make changes. If they don't, press ctrl+C, and then
do mkdocs build and then mkdocs serve.
When you have resolved your issue, open a pull request in the Narwhals repository.
Please adhere to the following guidelines:
-
Start your pull request title with a conventional commit tag. This helps us add your contribution to the right section of the changelog. We use "Type" from the Angular convention.
TLDR: The PR title should start with any of these abbreviations:
build,chore,ci,depr,docs,feat,fix,perf,refactor,release,test. Add a!at the end, if it is a breaking change. For examplerefactor!. -
This text will end up in the changelog.
-
Please follow the instructions in the pull request form and submit.
Codespaces is a great way to work on Narwhals without the need of configuring your local development environment. Every GitHub.com user has a monthly quota of free use of GitHub Codespaces, and you can start working in a codespace without providing any payment details. You'll be informed per email if you'll be close to using 100% of included services. To learn more about it visit GitHub Docs
If you're new to GitHub, you'll need to create an account on GitHub.com and verify your email address.
Go to the main project page. Fork the repository by clicking on the fork button. You can find it in the right corner on the top of the page.
Go to the forked repository on your GitHub account - you'll find it on your account in the tab Repositories.
Click on the green Code button and navigate to the Codespaces tab.
Click on the green button Create codespace on main - it will open a browser version of VSCode,
with the complete repository and git installed.
You can now proceed with the steps 5. Setting up your environment up to 10. Pull request
listed above in Working with local development environment.
If Narwhals looks like underwater unicorn magic to you, then please read how it works.
In Narwhals, we are very particular about imports. When it comes to importing heavy third-party libraries (pandas, NumPy, Polars, etc...) please follow these rules:
- Never import anything to do
isinstancechecks. Instead, just use the functions innarwhals.dependencies(such asis_pandas_dataframe); - If you need to import anything, do it in a place where you know that the import
is definitely available. For example, NumPy is a required dependency of PyArrow,
so it's OK to import NumPy to implement a PyArrow function - however, NumPy
should never be imported to implement a Polars function. The only exception is
for when there's simply no way around it by definition - for example,
Series.to_numpyalways requires NumPy to be installed. - Don't place a third-party import at the top of a file. Instead, place it in the function where it's used, so that we minimise the chances of it being imported unnecessarily.
We're trying to be really lightweight and minimal-overhead, and unnecessary imports can slow things down.
If you start working on an issue, it's usually a good idea to let others know about this in order to avoid duplicating work.
Do:
- When you're about to start working on an issue, and have understood the requirements and have some idea of what a solution would involve, comment "I've started working on this".
- Push partial work (even if unfinished) to a branch, which you can open in "draft" state.
- If someone else has commented that they're working on an issue but hadn't made any public work for 1-2 weeks, it's usually OK to assume that they're no longer working on it.
Don't:
- Don't claim issues that you intend to work on at a later date. For example, if it's Monday and you see an issue that interests you and you would like to work on it on Sunday, then the correct time to write "I'm working on this" is on Sunday when you start working on it.
- Don't ask for permission to work on issues, or to be assigned them. You have permission, we accept (and welcome!) contributions from everyone!
Above all else, please assume good intentions and go the extra mile to be super-extra-nice.
Some general guidelines:
- If in doubt, err on the side of being warm rather than being cold.
- If in doubt, put one extra positive emoji than one fewer one.
- Never delete or dismiss other maintainers' comments.
- Non-maintainers' comments should only be deleted if they are unambiguously spam
(e.g. crypto adverts). In cases of rude or abusive behaviour, please contact the
project author (
@MarcoGorelli). - Avoid escalating conflicts. People type harder than they speak, and online discourse is especially difficult. Again, please assume good intentions.
Please remember to abide by the code of conduct, else you'll be conducted away from this project.
We have a community call every 2 weeks, all welcome to attend.