Conversation
Signed-off-by: CY Chen <[email protected]>
Signed-off-by: CY Chen <[email protected]>
|
HTML artifacts: https://github.com/ros2/ros2_documentation/actions/runs/24916423129/artifacts/6635453352. To view the resulting site:
|
| that describes how to locate or reconstruct the payload on the receiving | ||
| side. | ||
| For a CPU-only backend the descriptor would just carry the bytes; | ||
| for a GPU backend it typically carries an IPC handle plus metadata. |
There was a problem hiding this comment.
not sure if this is accurate, the descriptor generally don't carry any IPC handle, as we use FD to import the GPU ptr, and the FD is transmitted through socket.
There was a problem hiding this comment.
i would suggest that "a small reference that the receiving side uses to re-attach to the payload, the exact mechanism is backend-specific.
| [this](sensor_msgs::msg::Image::SharedPtr msg) { | ||
| if (msg->data.get_backend_type() == "cuda") { | ||
| // Zero-copy GPU path: read the device pointer directly. | ||
| auto rh = cuda_buffer_backend::from_buffer(msg->data, stream_); |
There was a problem hiding this comment.
the input msg need to be const in order to get a readhandle. for non-const, it will return a write handle
| * - RMW implementation | ||
| - Support status | ||
| - Notes | ||
| * - ``rmw_fastrtps_cpp`` |
There was a problem hiding this comment.
i think we should also mention which QoS is not supported
| * **CPU** -- the frame is rendered on the GPU, copied back to host memory | ||
| with ``cudaMemcpy``, and then serialised through the RMW as a regular | ||
| ``uint8[]``. | ||
| No buffer backend is involved. |
There was a problem hiding this comment.
techinqually, i think it's still using the cpu buffer backend
fujitatomoya
left a comment
There was a problem hiding this comment.
- No need to keep the lines in certain length, as long as single sentence per line.
| The feature was introduced to let vendors transport large binary payloads | ||
| (camera images, point clouds, tensors, ...) through the existing ROS 2 | ||
| pub/sub API with as few copies as the underlying memory technology allows, | ||
| while keeping every piece of existing code that treats a ``uint8[]`` field as | ||
| a ``std::vector<uint8_t>`` working unchanged. |
There was a problem hiding this comment.
developers and users would think that this feature is available on services and actions, because services and actions in ROS 2 are built on top of the message types and endpoints plus generated services and topics under the hood.
but i do not think this is not supported yet. (not sure that services and actions are supported in the future, or only topic types are supported. saying limitation for now or design spec.)
i would add the description that services and actions are not supported with this feature explictly.
i think this is not limited by design, but the implementation of the registration is not yet implemented like rmw_subscription_options_t, the descriptor path with zero-copy GPU transport is, in the current merge, only wired through the topic transport.
| that describes how to locate or reconstruct the payload on the receiving | ||
| side. | ||
| For a CPU-only backend the descriptor would just carry the bytes; | ||
| for a GPU backend it typically carries an IPC handle plus metadata. |
There was a problem hiding this comment.
i would suggest that "a small reference that the receiving side uses to re-attach to the payload, the exact mechanism is backend-specific.
| :widths: 25 25 50 | ||
| :header-rows: 1 | ||
|
|
||
| * - RMW implementation |
There was a problem hiding this comment.
probably we need to add rmw_zenoh_cpp here as tier 1 implementation.
|
|
||
| * intra-process (same Python/C++ process); | ||
| * inter-process on the same host, same GPU, same user (via CUDA VMM IPC); | ||
| * inter-host is not supported; the RMW falls back to CPU serialization. |
There was a problem hiding this comment.
i actually think this is against design? The RMW doesn't decide what works across hosts but the backend does? architecture's whole point is that the RMW is backend-agnostic at the transport-mechanism level.
according to the current major scope, i do understand that all the buffers are probably and likely managed under inter-process communication. but i believe that this is not the limitation. for example, GPUDirect RDMA is designed for cross-host GPU-to-GPU transfer over RoCE/InfiniBand, CXL coherent memory?
|
|
||
| * A CUDA-capable GPU and the CUDA Toolkit (>= 11.8). | ||
| * SDL2, GLEW, OpenGL, X11 development packages. | ||
| * A ROS 2 Rolling source workspace. |
There was a problem hiding this comment.
Probably for the future, we could say Lyrical Luth or later, so that user can see the least distro version.
| @@ -0,0 +1,199 @@ | |||
| About ``rosidl::Buffer`` backends | |||
There was a problem hiding this comment.
how about adding explicitly the NITROS / Isaac ROS relationship with this? this could be one of the most common reader questions, and the docs currently appear to leave it unaddressed. explaining that native buffers operate at the rosidl layer rather than via REP-2007/2011 type adaptation, that they work cross-process out of the box, and that they coexist with type adaptation rather than competing with it. this would head off a lot of confusion i guess?
| * ``cuda_buffer_backend``: a realistic **base backend** built on CUDA VMM | ||
| and CUDA IPC, with intra-process, inter-process same-host, and CPU-fallback | ||
| paths. | ||
| See `ros2/rosidl_buffer_backends <https://github.com/ros2/rosidl_buffer_backends>`__. |
There was a problem hiding this comment.
obviously we need to merge ros2/rosidl_buffer_backends#1 before this doc is published.
| * ``demo_buffer_backend``: a minimal CPU-to-CPU backend used by the | ||
| ``rosidl::Buffer`` system tests. | ||
| Useful as a pedagogical example with no device dependencies. | ||
| See ``rcl_buffer/demo_buffer_backend`` in the workspace used by this |
There was a problem hiding this comment.
where is this package?
Description
Introduces documentation for the new
rosidl::Bufferfeature and its pluggable backend architecture.Pages created in this pull request to cover the features are::
Concepts/Intermediate/About-Buffer-Backends.rst— conceptual overview:the
Buffer<T>/BufferImplBase<T>/BufferBackendsplit, thedescriptor round-trip, discovery hooks, and the "base backend vs
composed backend" pattern (CUDA as a base; PyTorch as a composed
backend that can layer on top of multiple bases). Also clarifies that
the plugin interface is RMW-agnostic —
get_descriptor_type_support()returns a generic aggregate handle, and the RMW (currently
rmw_fastrtps_cpp) resolves it internally.How-To-Guides/Using-Buffer-Backends.rst— user guide for enabling abackend on a subscription via
SubscriptionOptions::acceptable_buffer_backends(
""/"cpu"/"any"/ comma-separated list), C++ and Pythonexamples, a per-RMW support matrix, and the three practical rules for a
compatible pub/sub pair (same backend installed on both sides, same RMW,
aligned package versions). Also emphasises that intra-process /
inter-process / inter-host transport scope is a property of each
backend, not of
rosidl::Buffer.Tutorials/Advanced/Writing-a-Buffer-Backend.rst— vendor-facingstep-by-step guide for implementing and packaging a new
BufferBackendplugin: interface walkthrough, descriptor design (including the 4096-byte
kMaxBufferDescriptorSizelimit),BufferImplBase<T>andBufferBackendimplementation,pluginlibregistration, CMake/package.xmlscaffolding, user-facing API patterns (
allocate_msg,from_buffer,to_buffer), the composed-backend pattern, and a ship checklist. Uses ageneric
mydevbackend in prose and cross-links the real CUDA, Torch,and demo backends for concrete reference.
Tutorials/Demos/GPU-Buffer-Transport.rst— runnable end-to-end demobased on
robot_arm_demofromros2/rosidl_buffer_backends_tutorials,comparing CUDA zero-copy vs CPU-serialised transport at several
resolutions, with the benchmark table reproduced from the demo README.
Other registration / discoverability page updates:
Concepts/Intermediate.rst,How-To-Guides.rst,Tutorials/Advanced.rst,Tutorials/Demos.rst— new pages added to therespective
toctrees.Glossary.rst— new entries for "Buffer", "Buffer backend", "Basebackend", "Composed backend", "Buffer descriptor", and "Acceptable
backend list".
The-ROS2-Project/Features.rst— added a "Pluggable buffer backends"row (marked experimental, currently supported in
rmw_fastrtps_cppand, C++ user-facing APIs only).Related-Projects.rst— added a "Communityrosidl::Bufferbackends"section linking to
ros2/rosidl_buffer_backendsandros2/rosidl_buffer_backends_tutorials.Did you use Generative AI?
Yes. Cursor with Claude Opus 4.7 was used to assist with the draft version of the docs included in this pull request.