Add gdp_sensor module for processing Global Drifter Program sensor by selipot · Pull Request #564 · Cloud-Drift/clouddrift

selipot · 2025-06-04T23:12:54Z

This is a PR to potentially add an adapter for the GDP s files. It has a number of issues. After downloading locally the s files from https://www.aoml.noaa.gov/ftp/pub/phod/pub/pazos/data/shane/sst

I am testing the code with

from clouddrift.adapters.gdp import gdp_sensor
ra = gdp_sensor.to_raggedarray(tmp_path='/Users/selipot/Data/drifters/raw/',skip_download=True)

But I had to manually do the following:
on line 3791269 of buoydata_1_5000_edited_sfiles.data, deleted manually

7720663   10 14.098 1996   1000.00   1000.00   1000.00    233.41    270.72**********

on lines 3808058 and 3808059, wrong string 2.00-111706.63
which I deleted manually.

Also line 3851218 deleted manually

7720673   10 19.099 1996   1000.00   1000.00      2.00**********   9145.48    184.70

Then I am running into the error that

_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/concurrent/futures/process.py", line 254, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/selipot/projects.git/clouddrift/clouddrift/adapters/gdp/gdp_sensor.py", line 371, in _process_chunk
    df_chunk = _apply_remove(
        preremove_df_chunk,
    ...<11 lines>...
        ],
    )
  File "/Users/selipot/projects.git/clouddrift/clouddrift/adapters/gdp/gdp_sensor.py", line 317, in _apply_remove
    mask = filter_(temp_df)
  File "/Users/selipot/projects.git/clouddrift/clouddrift/adapters/gdp/gdp_sensor.py", line 377, in <lambda>
    lambda df: (df["senObsYear"] > datetime.datetime.now().year)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/site-packages/pandas/core/arraylike.py", line 56, in __gt__
    return self._cmp_method(other, operator.gt)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/site-packages/pandas/core/series.py", line 6119, in _cmp_method
    res_values = ops.comparison_op(lvalues, rvalues, op)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/site-packages/pandas/core/ops/array_ops.py", line 344, in comparison_op
    res_values = comp_method_OBJECT_ARRAY(op, lvalues, rvalues)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/site-packages/pandas/core/ops/array_ops.py", line 129, in comp_method_OBJECT_ARRAY
    result = libops.scalar_compare(x.ravel(), y, op)
  File "ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>' not supported between instances of 'str' and 'int'
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 ra = gdp_sensor.to_raggedarray(tmp_path='/Users/selipot/Data/drifters/raw/',skip_download=True)

File ~/projects.git/clouddrift/clouddrift/adapters/gdp/gdp_sensor.py:630, in to_raggedarray(tmp_path, skip_download, max, chunk_size, use_fill_values, max_chunks)
    627 gdp_metadata_df = get_gdp_metadata(tmp_path)
    629 # Run async process to parallelize data processing.
--> 630 drifter_datasets = asyncio.run(
    631     _parallel_get(
    632         [dst for (_, dst) in requests],
    633         gdp_metadata_df,
    634         chunk_size,
    635         tmp_path,
    636         use_fill_values,
    637         max_chunks,
    638     )
    639 )
    641 # Sort the drifters by their start date.
    642 deploy_date_id_map = {
    643     ds["id"].data[0]: ds["start_date"].data[0] for ds in drifter_datasets
    644 }

File /opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/asyncio/runners.py:195, in run(main, debug, loop_factory)
    191     raise RuntimeError(
    192         "asyncio.run() cannot be called from a running event loop")
    194 with Runner(debug=debug, loop_factory=loop_factory) as runner:
--> 195     return runner.run(main)

File /opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/asyncio/runners.py:118, in Runner.run(self, coro, context)
    116 self._interrupt_count = 0
    117 try:
--> 118     return self._loop.run_until_complete(task)
    119 except exceptions.CancelledError:
    120     if self._interrupt_count > 0:

File /opt/homebrew/Caskroom/mambaforge/base/envs/clouddrift/lib/python3.13/asyncio/base_events.py:719, in BaseEventLoop.run_until_complete(self, future)
    716 if not future.done():
    717     raise RuntimeError('Event loop stopped before Future completed.')
--> 719 return future.result()

File ~/projects.git/clouddrift/clouddrift/adapters/gdp/gdp_sensor.py:538, in _parallel_get(sources, gdp_metadata_df, chunk_size, tmp_path, use_fill_values, max_chunks)
    536     chunk = jobmap[ajob]
    537     _logger.warn(f"bad chunk detected, exception: {ajob.exception()}")
--> 538     raise exc
    540 job_drifter_ds_map: dict[int, xr.Dataset] = ajob.result()
    541 for id_ in job_drifter_ds_map.keys():

TypeError: '>' not supported between instances of 'str' and 'int'

selipot · 2025-06-04T23:15:01Z

Not sure why the year is read as a string?

codecov · 2025-06-05T17:37:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

selipot · 2025-06-05T19:58:02Z

Thanks @KevinShuman! your modifications allowed me to complete the process and create a ragged array based on the s files. I will now spend some time checking if the result makes sense.

Add gdp_sensor module for processing Global Drifter Program sensor

1f14aa0

selipot requested review from KevinShuman and kevinsantana11 June 4, 2025 23:13

selipot self-assigned this Jun 4, 2025

selipot added the enhancement New feature or request label Jun 4, 2025

selipot added this to Data adapters Jun 4, 2025

selipot marked this pull request as draft June 5, 2025 17:02

Makes sure we are using numerical values for dates

6c68f6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gdp_sensor module for processing Global Drifter Program sensor#564

Add gdp_sensor module for processing Global Drifter Program sensor#564
selipot wants to merge 2 commits intoCloud-Drift:mainfrom
selipot:gdp-s-adapter

selipot commented Jun 4, 2025

Uh oh!

selipot commented Jun 4, 2025

Uh oh!

codecov Bot commented Jun 5, 2025

Uh oh!

selipot commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

selipot commented Jun 4, 2025

Uh oh!

selipot commented Jun 4, 2025

Uh oh!

codecov Bot commented Jun 5, 2025

Codecov Report

Uh oh!

selipot commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants