Skip to content

Commit 271e728

Browse files
authored
Update docs (#108)
* Add top-level docstrings * Add adapters to the API docs * Rewrite Usage * Update installation
1 parent cb148c7 commit 271e728

File tree

5 files changed

+113
-117
lines changed

5 files changed

+113
-117
lines changed

clouddrift/adapters/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1+
"""
2+
This module provides adapters to custom datasets.
3+
Each adapter module provides convenience functions and metadata to convert a
4+
custom dataset to a `clouddrift.RaggedArray` instance.
5+
Currently, clouddrift only provides an adapter module for the hourly Global
6+
Drifter Program (GDP) data, and more adapters will be added in the future.
7+
"""
8+
19
import clouddrift.adapters.gdp

clouddrift/adapters/gdp.py

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
"""
2+
This module provides functions and metadata that can be used to convert the
3+
hourly Global Drifter Program (GDP) data to a ``clouddrift.RaggedArray`` instance.
4+
"""
5+
16
from ..dataformat import RaggedArray
27
import numpy as np
38
import pandas as pd
@@ -254,24 +259,28 @@ def str_to_float(value: str, default=np.nan) -> float:
254259
return default
255260

256261

257-
def cut_str(value, max_length):
258-
"""
259-
Cut a string to a specific length.
260-
:param value: string
261-
max_length: length of the output
262-
:return: string with max_length chars
262+
def cut_str(value: str, max_length: int) -> np.chararray:
263+
"""Cut a string to a specific length and return it as a numpy chararray.
264+
265+
Args:
266+
value (str): String to cut
267+
max_length (int): Length of the output
268+
Returns:
269+
out (np.chararray): String with max_length characters
263270
"""
264271
charar = np.chararray(1, max_length)
265272
charar[:max_length] = value
266273
return charar
267274

268275

269276
def drogue_presence(lost_time, time):
270-
"""
271-
Create drogue status from the drogue lost time and the trajectory time
272-
:params lost_time: timestamp of the drogue loss (or NaT)
273-
time[obs]: observation time
274-
:return: bool[obs]: 1 drogued, 0 undrogued
277+
"""Create drogue status from the drogue lost time and the trajectory time.
278+
279+
Args:
280+
lost_time: Timestamp of the drogue loss (or NaT)
281+
time: Observation time
282+
Returns:
283+
out (bool): True if drogues and False otherwise
275284
"""
276285
if pd.isnull(lost_time) or lost_time >= time[-1]:
277286
return np.ones_like(time, dtype="bool")

docs/api.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,20 @@ API
55

66
Auto-generated summary of CloudDrift's API. For more details and examples, refer to the different Jupyter Notebooks.
77

8+
Adapters
9+
--------
10+
11+
.. automodule:: clouddrift.adapters
12+
:members:
13+
:undoc-members:
14+
15+
GDP
16+
^^^
17+
18+
.. automodule:: clouddrift.adapters.gdp
19+
:members:
20+
:undoc-members:
21+
822
Analysis
923
--------
1024

docs/install.rst

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,64 @@
11
.. _install:
22

33
Installation
4-
=============
4+
============
5+
6+
You can install the latest release of CloudDrift using pip or Conda.
7+
You can also install the latest development (unreleased) version from GitHub.
8+
9+
pip
10+
---
511

6-
For most *users*, the latest official package can be obtained from the `PyPi <pypi.org/project/clouddrift/>`_ repository:
12+
In your virtual environment, type:
713

814
.. code-block:: text
915
1016
pip install clouddrift
1117
12-
or (soon!) from the conda-forge repository:
18+
Conda
19+
-----
20+
21+
First add ``conda-forge`` to your channels in your Conda environment:
22+
23+
.. code-block:: text
24+
25+
conda config --add channels conda-forge
26+
conda config --set channel_priority strict
27+
28+
then install CloudDrift:
29+
30+
.. code-block:: text
31+
32+
conda install clouddrift
33+
34+
Developers
35+
----------
36+
37+
If you need the latest development version, get it from GitHub using pip:
38+
39+
.. code-block:: text
40+
41+
pip install git+https://github.com/Cloud-Drift/clouddrift
42+
43+
Running tests
44+
=============
45+
46+
To run the tests, you need to first download the CloudDrift source code from
47+
GitHub and install it in your virtual environment:
48+
1349

1450
.. code-block:: text
1551
16-
conda install -c conda-forge clouddrift
52+
git clone https://github.com/cloud-drift/clouddrift
53+
cd clouddrift
54+
python3 -m venv venv
55+
source venv/bin/activate
56+
pip install .
1757
18-
For *developpers* who want to install the latest development version, you can install directly from the clouddrift's GitHub repository:
58+
Then, run the tests like this:
1959

2060
.. code-block:: text
2161
22-
pip install git+https://github.com/Cloud-Drift/clouddrift.git
62+
python -m unittest tests/*.py
2363
2464
A quick how-to guide is provided on the `Usage <https://cloud-drift.github.io/clouddrift/usage.html>`_ page.

docs/usage.rst

Lines changed: 25 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -3,112 +3,37 @@
33
Usage
44
=====
55

6-
Data format
7-
-----------
8-
9-
The first release of CloudDrift provide a relatively *easy* way to convert any Lagrangian datasets into an archive of `contiguous ragged arrays <https://cfconventions.org/cf-conventions/cf-conventions.html#_contiguous_ragged_array_representation>`_. We provide a step-by-step guides to convert the individual trajectories from the Global Drifter Program (GDP) hourly and 6-hourly datasets, the drifters from the `CARTHE <http://carthe.org/>`_ experiment, and a typical output from a numerical Lagrangian experiment.
10-
11-
Below is a quick overview on how to transform an observational Lagrangian dataset stored into multiple files, or a numerical output from a Lagrangian simulation framework. Detailed examples are provided as Jupyter Notebooks which can be tested directly in a `Binder <https://mybinder.org/v2/gh/Cloud-Drift/clouddrift/main?labpath=examples>`_ executable environment.
12-
13-
Collection of files
14-
~~~~~~~~~~~~~~~~~~~
15-
16-
First, to create a ragged arrays archive for a dataset for which each trajectory is stored into a individual archive, e.g. the FTP distribution of the `GDP hourly dataset <https://www.aoml.noaa.gov/phod/gdp/hourly_data.php>`_, it is required to define a `preprocessing` function that returns an `xarray.Dataset <https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html>`_ for a trajectory from its identification number.
17-
18-
.. code-block:: python
19-
20-
def preprocess(index: int) -> xr.Dataset:
21-
"""
22-
:param index: drifter's identification number
23-
:return: xr.Dataset containing the data and attributes
24-
"""
25-
ds = xr.load_dataset(f'data/file_{index}.nc')
26-
27-
# perform required preprocessing steps
28-
# e.g. change units, remove variables, fix attributes, etc.
29-
30-
return ds
31-
32-
This function will be called for each indices of the dataset (`ids`) to construct the ragged arrays archive, as follow. The ragged arrays contains the required coordinates variables, as well as the specified metadata and data variables. Note that metadata variables contain one value per trajectory while the data variables contain `n` observations per trajectory.
33-
34-
.. code-block:: python
35-
36-
ids = [1,2,3] # trajectories to combine
37-
38-
# mandatory coordinates variables
39-
coords = {'ids': 'ids', 'time': 'time', 'lon': 'longitude', 'lat': 'latitude'}
40-
41-
# list of metadata and data from files to include in archive
42-
metadata = ['ID', 'rowsize']
43-
data = ['ve', 'vn']
44-
45-
ra = RaggedArray.from_files(ids, preprocess, coords, metadata, data)
46-
47-
which can be easily export to either a parquet archive file,
48-
49-
.. code-block:: python
50-
51-
ra.to_parquet('data/archive.parquet')
52-
53-
or a NetCDF archive file.
54-
55-
.. code-block:: python
56-
57-
ra.to_parquet('data/archive.nc')
58-
59-
Lagrangian numerical output
60-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
61-
62-
For a two-dimensional output (`lon`, `lat`, `time`) from a Lagrangian simulation framework (such as `OceanParcels <https://oceanparcels.org/>`_ or `OpenDrift <https://opendrift.github.io/>`_), the ragged arrays archive can be obtained by reshaping the variables to ragged arrays and populating the following dictionaries containing the coordinates, metadata, data, and attributes.
6+
CloudDrift provides an easy way to convert Lagrangian datasets into
7+
`contiguous ragged arrays <https://cfconventions.org/cf-conventions/cf-conventions.html#_contiguous_ragged_array_representation>`_.
638

649
.. code-block:: python
6510
66-
# initialize dictionaries
67-
coords = {}
68-
metadata = {}
69-
70-
# note that this example dataset does not contain other data than time, lon, lat, and ids
71-
# an empty dictionary "data" is initialize anyway
72-
data = {}
11+
# Import a GDP-hourly adapter function
12+
from clouddrift.adapters.gdp import to_raggedarray
7313
74-
Numerical outputs are usually stored as a 2D matrix (`trajectory`, `time`) filled with `nan` where there is no data. The first step is to identify the finite values and reshape the dataset.
75-
76-
.. code-block:: python
14+
# Download 100 random GDP-hourly trajectories as a ragged array
15+
ra = to_raggedarray(n_random_id=100)
7716
78-
ds = xr.open_dataset(join(folder, file), decode_times=False)
79-
finite_values = np.isfinite(ds['lon'])
80-
idx_finite = np.where(finite_values)
17+
# Store to NetCDF and Parquet files
18+
ra.to_netcdf("gdp.nc")
19+
ra.to_parquet("gdp.parquet")
8120
82-
# dimension and id of each trajectory
83-
rowsize = np.bincount(idx_finite[0])
84-
unique_id = np.unique(idx_finite[0])
21+
# Convert to Xarray Dataset for analysis
22+
ds = ra.to_xarray()
8523
86-
# coordinate variables
87-
coords["time"] = np.tile(ds.time.data, (ds.dims['traj'],1))[idx_finite]
88-
coords["lon"] = ds.lon.data[idx_finite]
89-
coords["lat"] = ds.lat.data[idx_finite]
90-
coords["ids"] = np.repeat(unique_id, rowsize)
91-
92-
Once this is done, we can include extra metadata, such as the size of each trajectory (`rowsize`), and create the ragged arrays archive.
93-
94-
.. code-block:: python
95-
96-
# metadata
97-
metadata["rowsize"] = rowsize
98-
metadata["ID"] = unique_id
99-
100-
# create the ragged arrays
101-
ra = RaggedArray(coords, metadata, data)
102-
ra.to_parquet('data/archive.parquet')
103-
104-
Analysis
105-
--------
106-
107-
Once an archive of ragged arrays is created, CloudDrift provides way to read in and convert the data to an `Awkward Array <https://awkward-array.org/quickstart.html>`_.
108-
109-
.. code-block:: python
24+
# Alternatively, convert to Awkward Array for analysis
25+
ds = ra.to_awkward()
11026
111-
ra = RaggedArray.from_parquet('data/archive.parquet')
112-
ds = ra.to_awkward()
27+
This snippet is specific to the hourly GDP dataset, however, you can use the
28+
``RaggedArray`` class directly to convert other custom datasets into a ragged
29+
array structure that is analysis ready via Xarray or Awkward Array packages.
30+
We provide step-by-step guides to convert the individual trajectories from the
31+
Global Drifter Program (GDP) hourly and 6-hourly datasets, the drifters from the
32+
`CARTHE <http://carthe.org/>`_ experiment, and a typical output from a numerical
33+
Lagrangian experiment in our
34+
`repository of example Jupyter Notebooks <https://github.com/cloud-drift/clouddrift-examples>`_.
35+
You can use these examples as a reference to ingest your own or other custom
36+
Lagrangian datasets into ``RaggedArray``.
11337

114-
Over the next year, the CloudDrift project will be developing a cloud-ready analysis library to perform typical Lagrangian workflows.
38+
In the future, ``clouddrift`` will be including functions to perform typical
39+
oceanographic Lagrangian analyses.

0 commit comments

Comments
 (0)