33Usage
44=====
55
6- CloudDrift provides an easy way to convert Lagrangian datasets into
6+ The CloudDrift library provides functions for:
7+
8+ * Easy access to cloud-ready Lagrangian ragged-array datasets;
9+ * Common Lagrangian analysis tasks on ragged arrays;
10+ * Adapting custom Lagrangian datasets into ragged arrays.
11+
12+ Let's start by importing the library and accessing a ready-to-use ragged-array
13+ dataset.
14+
15+ Accessing ragged-array Lagrangian datasets
16+ ------------------------------------------
17+
18+ We recommend to import the ``clouddrift `` using the ``cd `` shorthand, for convenience:
19+
20+ >>> import clouddrift as cd
21+
22+ CloudDrift provides a set of Lagrangian datasets that are ready to use.
23+ They can be accessed via the ``datasets `` submodule.
24+ In this example, we will load the NOAA's Global Drifter Program (GDP) hourly
25+ dataset, which is hosted in a public AWS bucket as a cloud-optimized Zarr
26+ dataset:
27+
28+ >>> ds = cd.datasets.gdp1h()
29+ >>> ds
30+ <xarray.Dataset>
31+ Dimensions: (traj: 17324, obs: 165754333)
32+ Coordinates:
33+ ids (obs) int64 ...
34+ lat (obs) float32 ...
35+ lon (obs) float32 ...
36+ time (obs) datetime64[ns] ...
37+ Dimensions without coordinates: traj, obs
38+ Data variables: (12/55)
39+ BuoyTypeManufacturer (traj) |S20 ...
40+ BuoyTypeSensorArray (traj) |S20 ...
41+ CurrentProgram (traj) float64 ...
42+ DeployingCountry (traj) |S20 ...
43+ DeployingShip (traj) |S20 ...
44+ DeploymentComments (traj) |S20 ...
45+ ... ...
46+ sst1 (obs) float64 ...
47+ sst2 (obs) float64 ...
48+ typebuoy (traj) |S10 ...
49+ typedeath (traj) int8 ...
50+ ve (obs) float32 ...
51+ vn (obs) float32 ...
52+ Attributes: (12/16)
53+ Conventions: CF-1.6
54+ acknowledgement: Elipot, Shane; Sykulski, Adam; Lumpkin, Rick; Centurio...
55+ contributor_name: NOAA Global Drifter Program
56+ contributor_role: Data Acquisition Center
57+ date_created: 2022-12-09T06:02:29.684949
58+ doi: 10.25921/x46c-3620
59+ ... ...
60+ processing_level: Level 2 QC by GDP drifter DAC
61+ publisher_email: [email protected] 62+ publisher_name: GDP Drifter DAC
63+ publisher_url: https://www.aoml.noaa.gov/phod/gdp
64+ summary: Global Drifter Program hourly data
65+ title: Global Drifter Program hourly drifting buoy collection
66+
67+ The ``gdp1h `` function returns an Xarray ``Dataset `` instance of the ragged-array dataset.
68+ While the dataset is quite large, around a dozen GB, it is not downloaded to your
69+ local machine. Instead, the dataset is accessed directly from the cloud, and only
70+ the data that is needed for the analysis is downloaded. This is possible thanks to
71+ the cloud-optimized Zarr format, which allows for efficient access to the data
72+ stored in the cloud.
73+
74+ Let's look at some variables in this dataset:
75+
76+ >>> ds.lon
77+ <xarray.DataArray 'lon' (obs: 165754333)>
78+ [165754333 values with dtype=float32]
79+ Coordinates:
80+ ids (obs) int64 ...
81+ lat (obs) float32 ...
82+ lon (obs) float32 ...
83+ time (obs) datetime64[ns] ...
84+ Dimensions without coordinates: obs
85+ Attributes:
86+ long_name: Longitude
87+ units: degrees_east
88+
89+ You see that this array is very long--it has 165754333 elements.
90+ This is because in a ragged array, many varying-length arrays are laid out as a
91+ contiguous 1-dimensional array in memory.
92+
93+ Let's look at the dataset dimensions:
94+
95+ >>> ds.dims
96+ Frozen({'traj': 17324, 'obs': 165754333})
97+
98+ The ``traj `` dimension has 17324 elements, which is the number of individual
99+ trajectories in the dataset.
100+ The sum of their lengths equals the length of the ``obs `` dimension.
101+ Internally, these dimensions, their lengths, and the ``count `` (or ``rowsize ``)
102+ variables are used internally to make CloudDrift's analysis functions aware of
103+ the bounds of each contiguous array within the ragged-array data structure.
104+
105+ Doing common analysis tasks on ragged arrays
106+ --------------------------------------------
107+
108+ Now that we have a ragged-array dataset loaded as an Xarray ``Dataset `` instance,
109+ let's do some common analysis tasks on it.
110+ Our dataset is on a remote server and fairly large (a dozen GB or so), so let's
111+ first subset it to several trajectories so that we can more easily work with it.
112+ The variable ``ID `` is the unique identifier for each trajectory:
113+
114+ >>> ds.ID [:10 ].values
115+ array([2578, 2582, 2583, 2592, 2612, 2613, 2622, 2623, 2931, 2932])
116+
117+ >>> from clouddrift.analysis import subset
118+
119+ ``subset `` allows you to subset a ragged array by some criterion.
120+ In this case, we will subset it by the ``ID `` variable:
121+
122+ >>> ds_sub = subset(ds, {" ID" : list (ds.ID [:5 ])})
123+ >>> ds_sub
124+ <xarray.Dataset>
125+ Dimensions: (traj: 5, obs: 13612)
126+ Coordinates:
127+ ids (obs) int64 2578 2578 2578 2578 ... 2612 2612 2612
128+ lat (obs) float32 ...
129+ lon (obs) float32 ...
130+ time (obs) datetime64[ns] ...
131+ Dimensions without coordinates: traj, obs
132+ Data variables: (12/55)
133+ BuoyTypeManufacturer (traj) |S20 ...
134+ BuoyTypeSensorArray (traj) |S20 ...
135+ CurrentProgram (traj) float64 ...
136+ DeployingCountry (traj) |S20 ...
137+ DeployingShip (traj) |S20 ...
138+ DeploymentComments (traj) |S20 ...
139+ ... ...
140+ sst1 (obs) float64 ...
141+ sst2 (obs) float64 ...
142+ typebuoy (traj) |S10 ...
143+ typedeath (traj) int8 ...
144+ ve (obs) float32 ...
145+ vn (obs) float32 ...
146+ Attributes: (12/16)
147+ Conventions: CF-1.6
148+ acknowledgement: Elipot, Shane; Sykulski, Adam; Lumpkin, Rick; Centurio...
149+ contributor_name: NOAA Global Drifter Program
150+ contributor_role: Data Acquisition Center
151+ date_created: 2022-12-09T06:02:29.684949
152+ doi: 10.25921/x46c-3620
153+ ... ...
154+ processing_level: Level 2 QC by GDP drifter DAC
155+ publisher_email: [email protected] 156+ publisher_name: GDP Drifter DAC
157+ publisher_url: https://www.aoml.noaa.gov/phod/gdp
158+ summary: Global Drifter Program hourly data
159+ title: Global Drifter Program hourly drifting buoy collection
160+
161+ You see that we now have a subset of the original dataset, with 5 trajectories
162+ and a total of 13612 observations.
163+ This subset is small enough to quickly and easily work with for demonstration
164+ purposes.
165+ Let's see how we can compute the mean and maximum velocities of each trajectory.
166+ To start, we'll need to obtain the velocities over all trajectory times.
167+ Although the GDP dataset already comes with velocity variables, we won't use
168+ them here so that we can learn how to compute them ourselves from positions.
169+ ``clouddrift `` provides the ``velocity_from_position `` function that allows you
170+ to do just that.
171+
172+ >>> from clouddrift.analysis import velocity_from_position
173+
174+ At a minimum ``velocity_from_position `` requires three input parameters:
175+ consecutive x- and y-coordinates and time, so we could do:
176+
177+ >>> u, v = velocity_from_position(ds_sub.lon, ds_sub.lat, ds_sub.time)
178+
179+ ``velocity_from_position `` returns two arrays, ``u `` and ``v ``, which are the
180+ zonal and meridional velocities, respectively.
181+ By default, it assumes that the coordinates are in degrees, and it handles the
182+ great circle path calculation and longitude wraparound under the hood.
183+ However, recall that ``ds_sub.lon ``, ``ds_sub.lat ``, and ``ds_sub.time `` are
184+ ragged arrays, so we need a different approach to calculate velocities while
185+ respecting the trajectory boundaries.
186+ For this, we can use the ``ragged_apply `` function, which applies a function
187+ to each trajectory in a ragged array, and returns the concatenated result.
188+
189+ >>> from clouddrift.analysis import apply_ragged
190+ >>> u, v = apply_ragged(velocity_from_position, [ds_sub.lon, ds_sub.lat, ds_sub.time], ds_sub.rowsize)
191+
192+ ``u `` and ``v `` here are still ragged arrays, which means that the five
193+ contiguous trajectories are concatenated into 1-dimensional arrays.
194+
195+ Now, let's compute the velocity magnitude in meters per second.
196+ The time in this dataset is loaded in nanoseconds by default:
197+
198+ >>> ds_sub.time.values
199+ array(['2005-04-15T20:00:00.000000000', '2005-04-15T21:00:00.000000000',
200+ '2005-04-15T22:00:00.000000000', ...,
201+ '2005-10-02T03:00:00.000000000', '2005-10-02T04:00:00.000000000',
202+ '2005-10-02T05:00:00.000000000'], dtype='datetime64[ns]')
203+
204+ So, to obtain the velocity magnitude in meters per second, we'll need to
205+ multiply our velocities by ``1e9 ``.
206+
207+ >>> velocity_magnitude = np.sqrt(u** 2 + v** 2 ) * 1e9
208+ >>> velocity_magnitude
209+ array([0.28053388, 0.6164632 , 0.89032112, ..., 0.2790803 , 0.20095603,
210+ 0.20095603])
211+
212+ >>> velocity_magnitude.mean(), velocity_magnitude.max()
213+ (0.22115242718877506, 1.6958275672626286)
214+
215+ However, these aren't the results we are looking for! Recall that we have the
216+ velocity magnitude of five different trajectories concatenated into one array.
217+ This means that we need to use ``apply_ragged `` again to compute the mean and
218+ maximum values:
219+
220+ >>> apply_ragged(np.mean, [velocity_magnitude], ds_sub.rowsize)
221+ array([0.32865148, 0.17752435, 0.1220523 , 0.13281067, 0.14041268])
222+ >>> apply_ragged(np.max, [velocity_magnitude], ds_sub.rowsize)
223+ array([1.69582757, 1.36804354, 0.97343434, 0.60353528, 1.05044213])
224+
225+ And there you go! We used ``clouddrift `` to:
226+
227+ #. Load a real-world Lagrangian dataset from the cloud;
228+ #. Subset the dataset by trajectory IDs;
229+ #. Compute the velocity vectors and their magnitudes for each trajectory;
230+ #. Compute the mean and maximum velocity magnitudes for each trajectory.
231+
232+ ``clouddrift `` offers many more functions for common Lagrangian analysis tasks.
233+ Please explore the `API <https://cloud-drift.github.io/clouddrift/api.html >`_
234+ to learn about other functions and how to use them.
235+
236+ Adapting custom Lagrangian datasets into ragged arrays
237+ ------------------------------------------------------
238+
239+ CloudDrift provides an easy way to convert custom Lagrangian datasets into
7240`contiguous ragged arrays <https://cfconventions.org/cf-conventions/cf-conventions.html#_contiguous_ragged_array_representation >`_.
8241
9242.. code-block :: python
@@ -26,14 +259,7 @@ CloudDrift provides an easy way to convert Lagrangian datasets into
26259
27260 This snippet is specific to the hourly GDP dataset, however, you can use the
28261``RaggedArray `` class directly to convert other custom datasets into a ragged
29- array structure that is analysis ready via Xarray or Awkward Array packages.
30- We provide step-by-step guides to convert the individual trajectories from the
31- Global Drifter Program (GDP) hourly and 6-hourly datasets, the drifters from the
32- `CARTHE <http://carthe.org/ >`_ experiment, and a typical output from a numerical
33- Lagrangian experiment in our
34- `repository of example Jupyter Notebooks <https://github.com/cloud-drift/clouddrift-examples >`_.
262+ array structure that is analysis ready via Xarray or Awkward Array packages.
263+ The functions to do that are defined in the ``clouddrift.adapters `` submodule.
35264You can use these examples as a reference to ingest your own or other custom
36- Lagrangian datasets into ``RaggedArray ``.
37-
38- In the future, ``clouddrift `` will be including functions to perform typical
39- oceanographic Lagrangian analyses.
265+ Lagrangian datasets into ``RaggedArray ``.
0 commit comments