Skip to content

Commit dff9772

Browse files
authored
MOSAiC dataset (#220)
* Add MOSAiC to easy-access datasets * Add Datasets docs * MOSAiC link and reference; consistent example in docstring * Add name and units attributes for longitude and latitude * Add MOSAiC reference to the adapter doctring as well
1 parent a864509 commit dff9772

File tree

4 files changed

+157
-2
lines changed

4 files changed

+157
-2
lines changed

clouddrift/adapters/mosaic.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@
44
55
The dataset is hosted at https://doi.org/10.18739/A2KP7TS83.
66
7+
Reference: Angela Bliss, Jennifer Hutchings, Philip Anderson, Philipp Anhaus,
8+
Hans Jakob Belter, Jørgen Berge, Vladimir Bessonov, Bin Cheng, Sylvia Cole,
9+
Dave Costa, Finlo Cottier, Christopher J Cox, Pedro R De La Torre, Dmitry V Divine,
10+
Gilbert Emzivat, Ying-Chih Fang, Steven Fons, Michael Gallagher, Maxime Geoffrey,
11+
Mats A Granskog, ... Guangyu Zuo. (2022). Sea ice drift tracks from the Distributed
12+
Network of autonomous buoys deployed during the Multidisciplinary drifting Observatory
13+
for the Study of Arctic Climate (MOSAiC) expedition 2019 - 2021. Arctic Data Center.
14+
doi:10.18739/A2KP7TS83.
15+
716
Example
817
-------
918
>>> from clouddrift.adapters import mosaic
@@ -115,4 +124,17 @@ def to_xarray():
115124
{"datetime": "time", "Sensor ID": "id"}
116125
)
117126

127+
# Set variable attributes
128+
ds["longitude"].attrs = {
129+
"long_name": "longitude",
130+
"standard_name": "longitude",
131+
"units": "degrees_east",
132+
}
133+
134+
ds["latitude"].attrs = {
135+
"long_name": "latitude",
136+
"standard_name": "latitude",
137+
"units": "degrees_north",
138+
}
139+
118140
return ds

clouddrift/datasets.py

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
This module provides functions to easily access ragged-array datasets.
33
"""
44

5+
from clouddrift import adapters
6+
import os
57
import xarray as xr
68

79

@@ -23,7 +25,8 @@ def gdp1h() -> xr.Dataset:
2325
Examples
2426
--------
2527
>>> from clouddrift.datasets import gdp1h
26-
>>> gdp1h()
28+
>>> ds = gdp1h()
29+
>>> ds
2730
<xarray.Dataset>
2831
Dimensions: (traj: 17324, obs: 165754333)
2932
Coordinates:
@@ -85,7 +88,8 @@ def gdp6h() -> xr.Dataset:
8588
Examples
8689
--------
8790
>>> from clouddrift.datasets import gdp6h
88-
>>> gdp6h()
91+
>>> ds = gdp6h()
92+
>>> ds
8993
<xarray.Dataset>
9094
Dimensions: (traj: 26843, obs: 44544647)
9195
Coordinates:
@@ -129,3 +133,68 @@ def gdp6h() -> xr.Dataset:
129133
"""
130134
url = "https://www.aoml.noaa.gov/ftp/pub/phod/buoydata/gdp_jul22_ragged_6h.nc#mode=bytes"
131135
return xr.open_dataset(url)
136+
137+
138+
def mosaic() -> xr.Dataset:
139+
"""Returns the MOSAiC sea-ice drift dataset as an Xarray dataset.
140+
141+
The function will first look for the ragged-array dataset on the local
142+
filesystem. If it is not found, the dataset will be downloaded using the
143+
corresponding adapter function and stored for later access.
144+
145+
The upstream data is available at https://arcticdata.io/catalog/view/doi:10.18739/A2KP7TS83.
146+
147+
Reference: Angela Bliss, Jennifer Hutchings, Philip Anderson, Philipp Anhaus,
148+
Hans Jakob Belter, Jørgen Berge, Vladimir Bessonov, Bin Cheng, Sylvia Cole,
149+
Dave Costa, Finlo Cottier, Christopher J Cox, Pedro R De La Torre, Dmitry V Divine,
150+
Gilbert Emzivat, Ying-Chih Fang, Steven Fons, Michael Gallagher, Maxime Geoffrey,
151+
Mats A Granskog, ... Guangyu Zuo. (2022). Sea ice drift tracks from the Distributed
152+
Network of autonomous buoys deployed during the Multidisciplinary drifting Observatory
153+
for the Study of Arctic Climate (MOSAiC) expedition 2019 - 2021. Arctic Data Center.
154+
doi:10.18739/A2KP7TS83.
155+
156+
Returns
157+
-------
158+
xarray.Dataset
159+
MOSAiC sea-ice drift dataset as a ragged array
160+
161+
Examples
162+
--------
163+
>>> from clouddrift.datasets import mosaic
164+
>>> ds = mosaic()
165+
>>> ds
166+
<xarray.Dataset>
167+
Dimensions: (obs: 1926226, traj: 216)
168+
Coordinates:
169+
time (obs) datetime64[ns] ...
170+
id (traj) object ...
171+
Dimensions without coordinates: obs, traj
172+
Data variables: (12/19)
173+
latitude (obs) float64 ...
174+
longitude (obs) float64 ...
175+
Deployment Leg (traj) int64 ...
176+
DN Station ID (traj) object ...
177+
IMEI (traj) object ...
178+
Deployment Date (traj) datetime64[ns] ...
179+
... ...
180+
Buoy Type (traj) object ...
181+
Manufacturer (traj) object ...
182+
Model (traj) object ...
183+
PI (traj) object ...
184+
Data Authors (traj) object ...
185+
count (traj) int64 ...
186+
"""
187+
clouddrift_path = (
188+
os.path.expanduser("~/.clouddrift")
189+
if not os.getenv("CLOUDDRIFT_PATH")
190+
else os.getenv("CLOUDDRIFT_PATH")
191+
)
192+
mosaic_path = f"{clouddrift_path}/data/mosaic.nc"
193+
if not os.path.exists(mosaic_path):
194+
print(f"{mosaic_path} not found; download from upstream repository.")
195+
ds = adapters.mosaic.to_xarray()
196+
os.makedirs(os.path.dirname(mosaic_path), exist_ok=True)
197+
ds.to_netcdf(mosaic_path)
198+
else:
199+
ds = xr.open_dataset(mosaic_path)
200+
return ds

docs/datasets.rst

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
.. _datasets:
2+
3+
Datasets
4+
========
5+
6+
CloudDrift provides convenience functions to access real-world ragged-array
7+
datasets.
8+
9+
>>> from clouddrift.datasets import gdp1h
10+
>>> ds = gdp1h()
11+
<xarray.Dataset>
12+
Dimensions: (traj: 17324, obs: 165754333)
13+
Coordinates:
14+
ids (obs) int64 ...
15+
lat (obs) float32 ...
16+
lon (obs) float32 ...
17+
time (obs) datetime64[ns] ...
18+
Dimensions without coordinates: traj, obs
19+
Data variables: (12/55)
20+
BuoyTypeManufacturer (traj) |S20 ...
21+
BuoyTypeSensorArray (traj) |S20 ...
22+
CurrentProgram (traj) float64 ...
23+
DeployingCountry (traj) |S20 ...
24+
DeployingShip (traj) |S20 ...
25+
DeploymentComments (traj) |S20 ...
26+
... ...
27+
sst1 (obs) float64 ...
28+
sst2 (obs) float64 ...
29+
typebuoy (traj) |S10 ...
30+
typedeath (traj) int8 ...
31+
ve (obs) float32 ...
32+
vn (obs) float32 ...
33+
Attributes: (12/16)
34+
Conventions: CF-1.6
35+
acknowledgement: Elipot, Shane; Sykulski, Adam; Lumpkin, Rick; Centurio...
36+
contributor_name: NOAA Global Drifter Program
37+
contributor_role: Data Acquisition Center
38+
date_created: 2022-12-09T06:02:29.684949
39+
doi: 10.25921/x46c-3620
40+
... ...
41+
processing_level: Level 2 QC by GDP drifter DAC
42+
publisher_email: [email protected]
43+
publisher_name: GDP Drifter DAC
44+
publisher_url: https://www.aoml.noaa.gov/phod/gdp
45+
summary: Global Drifter Program hourly data
46+
title: Global Drifter Program hourly drifting buoy collection
47+
48+
Currently available datasets are:
49+
50+
- :func:`clouddrift.datasets.gdp1h`: 1-hourly Global Drifter Program (GDP) data
51+
from a `cloud-optimized Zarr dataset on AWS <https://registry.opendata.aws/noaa-oar-hourly-gdp/.>`_.
52+
- :func:`clouddrift.datasets.gdp6h`: 6-hourly GDP data from a ragged-array
53+
NetCDF file hosted by the public HTTPS server at
54+
`NOAA's Atlantic Oceanographic and Meteorological Laboratory (AOML) <https://www.aoml.noaa.gov/phod/gdp/index.php>`_.
55+
- :func:`clouddrift.datasets.mosaic`: MOSAiC sea-ice drift dataset as a ragged
56+
array processed from the upstream dataset hosted at the
57+
`NSF's Arctic Data Center <https://doi.org/10.18739/A2KP7TS83>`_.
58+
59+
The GDP datasets are accessed lazily, so the data is only downloaded when
60+
specific array values are referenced. The MOSAiC dataset is downloaded in its
61+
entirety when the function is called for the first time and stored locally for
62+
later use.

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Getting started
1919

2020
* :doc:`install`
2121
* :doc:`usage`
22+
* :doc:`datasets`
2223

2324
.. toctree::
2425
:hidden:
@@ -27,6 +28,7 @@ Getting started
2728

2829
install
2930
usage
31+
datasets
3032

3133
Reference
3234
---------

0 commit comments

Comments
 (0)