Skip to content

Add documents for rosidl::Buffer features#6440

Open
nvcyc wants to merge 3 commits intorollingfrom
nvcyc/rosidl_buffer
Open

Add documents for rosidl::Buffer features#6440
nvcyc wants to merge 3 commits intorollingfrom
nvcyc/rosidl_buffer

Conversation

@nvcyc
Copy link
Copy Markdown

@nvcyc nvcyc commented Apr 21, 2026

Description

Introduces documentation for the new rosidl::Buffer feature and its pluggable backend architecture.

Pages created in this pull request to cover the features are::

  • Concepts/Intermediate/About-Buffer-Backends.rst — conceptual overview:
    the Buffer<T> / BufferImplBase<T> / BufferBackend split, the
    descriptor round-trip, discovery hooks, and the "base backend vs
    composed backend" pattern (CUDA as a base; PyTorch as a composed
    backend that can layer on top of multiple bases). Also clarifies that
    the plugin interface is RMW-agnostic — get_descriptor_type_support()
    returns a generic aggregate handle, and the RMW (currently
    rmw_fastrtps_cpp) resolves it internally.

  • How-To-Guides/Using-Buffer-Backends.rst — user guide for enabling a
    backend on a subscription via
    SubscriptionOptions::acceptable_buffer_backends
    (""/"cpu" / "any" / comma-separated list), C++ and Python
    examples, a per-RMW support matrix, and the three practical rules for a
    compatible pub/sub pair (same backend installed on both sides, same RMW,
    aligned package versions). Also emphasises that intra-process /
    inter-process / inter-host transport scope is a property of each
    backend, not of rosidl::Buffer.

  • Tutorials/Advanced/Writing-a-Buffer-Backend.rst — vendor-facing
    step-by-step guide for implementing and packaging a new BufferBackend
    plugin: interface walkthrough, descriptor design (including the 4096-byte
    kMaxBufferDescriptorSize limit), BufferImplBase<T> and
    BufferBackend implementation, pluginlib registration, CMake/package.xml
    scaffolding, user-facing API patterns (allocate_msg, from_buffer,
    to_buffer), the composed-backend pattern, and a ship checklist. Uses a
    generic mydev backend in prose and cross-links the real CUDA, Torch,
    and demo backends for concrete reference.

  • Tutorials/Demos/GPU-Buffer-Transport.rst — runnable end-to-end demo
    based on robot_arm_demo from ros2/rosidl_buffer_backends_tutorials,
    comparing CUDA zero-copy vs CPU-serialised transport at several
    resolutions, with the benchmark table reproduced from the demo README.

Other registration / discoverability page updates:

  • Concepts/Intermediate.rst, How-To-Guides.rst,
    Tutorials/Advanced.rst, Tutorials/Demos.rst — new pages added to the
    respective toctrees.
  • Glossary.rst — new entries for "Buffer", "Buffer backend", "Base
    backend", "Composed backend", "Buffer descriptor", and "Acceptable
    backend list".
  • The-ROS2-Project/Features.rst — added a "Pluggable buffer backends"
    row (marked experimental, currently supported in rmw_fastrtps_cpp and, C++ user-facing APIs only).
  • Related-Projects.rst — added a "Community rosidl::Buffer backends"
    section linking to ros2/rosidl_buffer_backends and
    ros2/rosidl_buffer_backends_tutorials.

Did you use Generative AI?

Yes. Cursor with Claude Opus 4.7 was used to assist with the draft version of the docs included in this pull request.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

HTML artifacts: https://github.com/ros2/ros2_documentation/actions/runs/24916423129/artifacts/6635453352.

To view the resulting site:

  1. Click on the above link to download the artifacts archive
  2. Extract it
  3. Open html-artifacts-6440/index.html in your favorite browser

that describes how to locate or reconstruct the payload on the receiving
side.
For a CPU-only backend the descriptor would just carry the bytes;
for a GPU backend it typically carries an IPC handle plus metadata.
Copy link
Copy Markdown

@yuanknv yuanknv Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is accurate, the descriptor generally don't carry any IPC handle, as we use FD to import the GPU ptr, and the FD is transmitted through socket.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest that "a small reference that the receiving side uses to re-attach to the payload, the exact mechanism is backend-specific.

[this](sensor_msgs::msg::Image::SharedPtr msg) {
if (msg->data.get_backend_type() == "cuda") {
// Zero-copy GPU path: read the device pointer directly.
auto rh = cuda_buffer_backend::from_buffer(msg->data, stream_);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input msg need to be const in order to get a readhandle. for non-const, it will return a write handle

* - RMW implementation
- Support status
- Notes
* - ``rmw_fastrtps_cpp``
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should also mention which QoS is not supported

* **CPU** -- the frame is rendered on the GPU, copied back to host memory
with ``cudaMemcpy``, and then serialised through the RMW as a regular
``uint8[]``.
No buffer backend is involved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

techinqually, i think it's still using the cpu buffer backend

Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • No need to keep the lines in certain length, as long as single sentence per line.

Comment on lines +18 to +22
The feature was introduced to let vendors transport large binary payloads
(camera images, point clouds, tensors, ...) through the existing ROS 2
pub/sub API with as few copies as the underlying memory technology allows,
while keeping every piece of existing code that treats a ``uint8[]`` field as
a ``std::vector<uint8_t>`` working unchanged.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

developers and users would think that this feature is available on services and actions, because services and actions in ROS 2 are built on top of the message types and endpoints plus generated services and topics under the hood.
but i do not think this is not supported yet. (not sure that services and actions are supported in the future, or only topic types are supported. saying limitation for now or design spec.)

i would add the description that services and actions are not supported with this feature explictly.
i think this is not limited by design, but the implementation of the registration is not yet implemented like rmw_subscription_options_t, the descriptor path with zero-copy GPU transport is, in the current merge, only wired through the topic transport.

that describes how to locate or reconstruct the payload on the receiving
side.
For a CPU-only backend the descriptor would just carry the bytes;
for a GPU backend it typically carries an IPC handle plus metadata.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest that "a small reference that the receiving side uses to re-attach to the payload, the exact mechanism is backend-specific.

:widths: 25 25 50
:header-rows: 1

* - RMW implementation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we need to add rmw_zenoh_cpp here as tier 1 implementation.


* intra-process (same Python/C++ process);
* inter-process on the same host, same GPU, same user (via CUDA VMM IPC);
* inter-host is not supported; the RMW falls back to CPU serialization.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually think this is against design? The RMW doesn't decide what works across hosts but the backend does? architecture's whole point is that the RMW is backend-agnostic at the transport-mechanism level.

according to the current major scope, i do understand that all the buffers are probably and likely managed under inter-process communication. but i believe that this is not the limitation. for example, GPUDirect RDMA is designed for cross-host GPU-to-GPU transfer over RoCE/InfiniBand, CXL coherent memory?


* A CUDA-capable GPU and the CUDA Toolkit (>= 11.8).
* SDL2, GLEW, OpenGL, X11 development packages.
* A ROS 2 Rolling source workspace.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably for the future, we could say Lyrical Luth or later, so that user can see the least distro version.

@@ -0,0 +1,199 @@
About ``rosidl::Buffer`` backends
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding explicitly the NITROS / Isaac ROS relationship with this? this could be one of the most common reader questions, and the docs currently appear to leave it unaddressed. explaining that native buffers operate at the rosidl layer rather than via REP-2007/2011 type adaptation, that they work cross-process out of the box, and that they coexist with type adaptation rather than competing with it. this would head off a lot of confusion i guess?

* ``cuda_buffer_backend``: a realistic **base backend** built on CUDA VMM
and CUDA IPC, with intra-process, inter-process same-host, and CPU-fallback
paths.
See `ros2/rosidl_buffer_backends <https://github.com/ros2/rosidl_buffer_backends>`__.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously we need to merge ros2/rosidl_buffer_backends#1 before this doc is published.

* ``demo_buffer_backend``: a minimal CPU-to-CPU backend used by the
``rosidl::Buffer`` system tests.
Useful as a pedagogical example with no device dependencies.
See ``rcl_buffer/demo_buffer_backend`` in the workspace used by this
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants