Conversation
Warp-first dual-access array wrapper for articulation data properties. Provides explicit .torch and .warp accessors with a deprecation bridge for smooth migration from wp.to_torch() patterns.
List concrete test suites and pre-commit checks to run during implementation.
Six tasks: TorchArray class (TDD), base type hints, PhysX data, Newton data, validation tests, changelogs.
Warp-first dual-access array wrapper. Provides cached zero-copy .torch view and .warp accessor for kernel interop. Includes a deprecation bridge (__torch_function__ + operators) for migration.
Cover __radd__, __sub__, __mul__, __neg__, and comparison operators individually to prevent regressions.
Change all property return type hints from wp.array to TorchArray. Wrap deprecated property returns in TorchArray().
Wrap all property returns in TorchArray(). Internal warp kernels access sibling properties via .warp for wp.launch() compatibility.
Same pattern as PhysX: wrap returns in TorchArray(), access sibling properties via .warp for wp.launch() compatibility.
Update three packages with new version entries documenting the TorchArray feature that wraps wp.array with explicit .torch and .warp accessors: - isaaclab 4.6.1 -> 4.6.2 - isaaclab_physx 0.5.16 -> 0.5.17 - isaaclab_newton 0.5.13 -> 0.5.14
Internal code that passes data properties to wp.launch(), wp.zeros_like(), wp.clone(), and wp.to_torch() now uses .warp or .torch explicitly.
Internal code that passes data properties to wp.launch(), wp.zeros_like(), and wp.to_torch() now uses .warp or .torch explicitly.
Replace wp.to_torch(data.property) with data.property.torch throughout the test file for data properties that now return TorchArray. Calls on root_view methods and wrench_composer properties are kept as wp.to_torch() since those still return plain warp arrays.
Replace wp.to_torch(data.property) with data.property.torch in MDP termination functions. Add .warp in wrench composer lambda.
Replace wp.to_torch(data.property) with data.property.torch throughout the test file.
Replace wp.to_torch(data.property) with data.property.torch across observations, rewards, events, actions, and commands.
Replace wp.to_torch(data.property) with data.property.torch across observations, rewards, events, actions, commands, and cartpole task.
Replace wp.to_torch(data.property) with data.property.torch.
Replace wp.to_torch(data.property) with data.property.torch across all manager-based RL task environments.
Replace wp.to_torch(data.property) with data.property.torch across all direct RL task environments.
Update base and PhysX PVA/IMU data classes to return TorchArray. Update tests to use .torch accessor instead of wp.to_torch().
Update PhysX DeformableObject data class to return TorchArray from all properties and class-level attributes. Fix asset class consumers to use .warp for kernel inputs and .torch for torch contexts. Update tests, tutorial scripts, and interactive_scene to use .torch instead of wp.to_torch().
Update base, PhysX, and Newton FrameTransformer data classes to return TorchArray. Fix consumers and update tests.
Update base, PhysX, and Newton RigidObjectCollection data classes to return TorchArray. Fix asset class consumers and update tests.
Update base, PhysX, and Newton RigidObject data classes to return TorchArray. Fix asset class consumers and update tests.
Update base, PhysX, and Newton ContactSensor data classes to return TorchArray. Fix sensor class consumers and update tests.
GRAVITY_VEC_W is a raw wp.array constant, not a TorchArray property.
default_joint_pos/vel need .torch before passing to write_joint_state_to_sim.
Convert interactive_scene, viewport_camera_controller, io_descriptors, anymal_c, mimic, and experimental task files.
Convert controller tests, mock interface tests, env tests, scene tests, and mimic tests.
Avoid per-access TorchArray allocation by caching instances in _create_buffers. Newton wp.array pointers are stable, so pinned TorchArrays remain valid across sim steps.
Same optimization as ArticulationData: cache TorchArray instances to avoid per-access allocation. Newton wp.array pointers are stable.
Cache TorchArray instances in ContactSensor and FrameTransformer data classes to avoid per-access allocation.
Strided warp views from transform_to_vec_quat may not produce zero-copy torch tensors, causing stale data with pinned TorchArrays.
Two bugs in the Newton pinning optimization: 1. After sim.reset(), Newton recreates simulation bindings with new wp.array pointers, but pinned TorchArrays still referenced the old ones. Fix: add TorchArray.rebind() and call it when bindings are recreated. 2. Non-contiguous sliced properties (pos/quat from transforms) need the helper kernel to re-run each access to copy fresh data, but the lazy pinning guard skipped the helper after first call. Fix: always call the helper, only skip TorchArray creation.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR introduces TorchArray, a warp-first dual-access array wrapper that replaces raw wp.array returns from all data properties across the framework. The design provides explicit .torch and .warp accessors with a deprecation bridge for backward compatibility. The migration is comprehensive — 143 files touching articulation, rigid object, sensor, MDP, and task environment code across both PhysX and Newton backends.
The core TorchArray class is clean and well-tested. The Newton pinning optimization is clever but has a correctness gap in the rebind path for sliced properties. The PhysX backend takes a simpler (allocation-per-access) approach that sidesteps this class of bug entirely.
Design Assessment
Design is sound. The warp-first wrapper with cached zero-copy torch views is the right approach for this codebase. The deprecation bridge via __torch_function__ + operator overloads provides a clean migration path. The Newton pinning optimization is well-motivated (avoiding per-access allocation for stable pointers), and the FrameTransformer revert shows good judgment about when pinning isn't safe.
Two design asymmetries worth noting:
- PhysX vs Newton allocation strategy: PhysX creates a new
TorchArrayper property access (~80 allocations/step), while Newton pins them. Both are valid but the inconsistency may confuse contributors. Consider documenting this difference. _deprecation_warnedis a class-level singleton: Once any TorchArray triggers the warning, all instances go silent. This is reasonable for one-time migration warnings but could mask issues if different subsystems migrate at different rates.
Findings
🔴 Critical: Newton sliced wp.array buffers not invalidated on sim rebind — source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation_data.py:1433-1450
When _create_simulation_bindings() is called after a sim reset (via the PHYSICS_READY callback), the code correctly invalidates the pinned TorchArray wrappers (_root_link_pos_w_ta = None, etc.) but does not invalidate the intermediate sliced wp.array buffers (_root_link_pos_w, _root_link_quat_w, etc.).
For contiguous transforms, _get_pos_from_transform() creates a strided wp.array view via wp.array(ptr=transform.ptr, ...). After a sim reset:
- The transform sim binding is rebound to a new pointer ✅
- The TorchArray wrapper is invalidated ✅
- But
self._root_link_pos_wstill holds the old stridedwp.arraywithptrpointing to the old (freed) transform memory ❌
On next property access, _get_pos_from_transform(self._root_link_pos_w, self.root_link_pose_w.warp) sees source is not None and transform.is_contiguous, so it returns the stale source unchanged — serving data from freed memory.
Fix: Also reset the underlying wp.array sliced buffers alongside the TorchArray wrappers:
# Invalidate lazy sliced TorchArrays AND their backing wp.arrays
# so they are re-created from the new sim bindings on next access.
self._root_link_pos_w_ta = None
self._root_link_pos_w = None
self._root_link_quat_w_ta = None
self._root_link_quat_w = None
self._root_link_lin_vel_w_ta = None
self._root_link_lin_vel_w = None
self._root_link_ang_vel_w_ta = None
self._root_link_ang_vel_w = None
# ... same pattern for root_com_*, body_link_*, body_com_*This same pattern should be verified in RigidObjectData and RigidObjectCollectionData if they also have sim rebind callbacks.
🟡 Warning: No __getitem__ on TorchArray — direct indexing will fail — source/isaaclab/isaaclab/utils/warp/torch_array.py
TorchArray does not implement __getitem__, so ta[0], ta[:, 2], etc. will raise TypeError: 'TorchArray' object is not subscriptable. While the consumer code in this PR consistently uses .torch before indexing, this is a footgun for anyone writing new code against the TorchArray API — especially since the old wp.array supported indexing.
The __torch_function__ protocol only intercepts calls to torch.* functions, not Python's [] operator. Consider adding:
def __getitem__(self, key):
self._warn_implicit()
return self.torch[key]This would maintain the deprecation bridge's backward compatibility promise more completely.
🟡 Warning: PhysX root_com_vel_w reassigns .data but TorchArray wraps it per-access — source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation_data.py:613
self._root_com_vel_w.data = self._root_view.get_root_velocities().view(wp.spatial_vectorf)Some PhysX properties replace the TimestampedBuffer.data wp.array reference entirely each step. Since PhysX creates a new TorchArray per access, this works correctly. But if someone later applies the Newton pinning optimization to PhysX without understanding this pattern, pinned TorchArrays would serve stale data. A brief comment noting why PhysX uses per-access TorchArray allocation (and why pinning would be unsafe here) would prevent future bugs.
🔵 Suggestion: Add __init__ type guard for wp_array parameter — source/isaaclab/isaaclab/utils/warp/torch_array.py:54-61
def __init__(self, wp_array: wp.array) -> None:
if not isinstance(wp_array, wp.array):
raise TypeError(f"TorchArray expects a wp.array, got {type(wp_array).__name__}")
self._warp = wp_array
self._torch_cache: torch.Tensor | None = NoneThis would catch accidental wrapping of non-warp objects (e.g., passing a torch.Tensor or None) at construction time rather than producing confusing errors on .torch or .warp access.
🔵 Suggestion: Missing rebind() test — source/isaaclab/test/utils/warp/test_torch_array.py
The test suite thoroughly covers basic functionality, structured types, convenience properties, and the deprecation bridge — good coverage. However, rebind() is a critical method (it's the mechanism that prevents stale data after sim resets) and has no direct test. Consider adding:
def test_rebind_invalidates_cache(self, device):
from isaaclab.utils.warp.torch_array import TorchArray
arr1 = wp.zeros(10, dtype=wp.float32, device=device)
ta = TorchArray(arr1)
t1 = ta.torch # Cache the tensor
arr2 = wp.ones(10, dtype=wp.float32, device=device)
ta.rebind(arr2)
t2 = ta.torch
assert t1 is not t2 # Cache was invalidated
assert ta.warp is arr2 # Warp points to new array
assert t2[0].item() == 1.0 # New data is servedTest Coverage
- Core TorchArray class: Well tested — basic ops, structured types, deprecation bridge, operator forwarding. ✅
- Missing:
rebind()method test,__getitem__behavior (would currently fail), behavior when wrapping a 0-dimensional array. - Integration: Existing PhysX/Newton articulation/sensor tests updated to use
.torchaccessor. The test updates appear mechanical and correct. - Coverage adequate: Yes (for a feature PR), with the
rebind()gap noted above.
CI Status
Only the labeler check has run (pass). No test or lint CI results available yet — recommend waiting for full CI before merge.
Verdict
Significant concerns
The core TorchArray design is solid and the migration is comprehensive, but the Newton rebind bug (stale sliced wp.array pointers after sim reset) is a correctness issue that could cause silent data corruption in RL training loops. This should be addressed before merge. The other findings are lower priority but would improve robustness.
| self._body_com_pos_b_ta.rebind(self._sim_bind_body_com_pos_b) | ||
| # Invalidate lazy sliced TorchArrays so they are re-created from the | ||
| # new sim bindings on next access. | ||
| self._root_link_pos_w_ta = None |
There was a problem hiding this comment.
🔴 Critical: Sliced wp.array buffers not invalidated on rebind
This block invalidates the TorchArray wrappers (_root_link_pos_w_ta = None) but not the underlying sliced wp.array buffers (_root_link_pos_w). After sim reset, contiguous strided views created by _get_pos_from_transform() still hold ptr into the old (freed) transform memory. Since source is not None, the helper returns the stale buffer without re-creating it from the new transform.
Fix: Also set the wp.array buffers to None:
| self._root_link_pos_w_ta = None | |
| # Invalidate lazy sliced TorchArrays AND their backing wp.arrays | |
| # so they are re-created from the new sim bindings on next access. | |
| self._root_link_pos_w_ta = None | |
| self._root_link_pos_w = None |
(Same pattern for all 16 sliced properties: _root_link_quat_w, _root_link_lin_vel_w, _root_link_ang_vel_w, _root_com_*, _body_link_*, _body_com_*)
| _deprecation_warned: ClassVar[bool] = False | ||
| """Class-level flag ensuring the deprecation warning is emitted at most once.""" | ||
|
|
||
| def __init__(self, wp_array: wp.array) -> None: |
There was a problem hiding this comment.
🟡 Warning: No __getitem__ support
TorchArray doesn't implement __getitem__, so ta[0] or ta[:, 2] will raise TypeError. The __torch_function__ protocol only intercepts torch.* function calls, not Python's [] operator.
Consider adding a subscript forwarder for the deprecation bridge:
def __getitem__(self, key):
self._warn_implicit()
return self.torch[key]1. Invalidate backing wp.array buffers (not just TorchArray wrappers) on Newton sim rebind. Without this, contiguous strided views hold stale pointers into freed transform memory after sim reset. 2. Add __getitem__/__setitem__ to TorchArray for the deprecation bridge. Supports all torch indexing (1D, ND, slices, masks).
Invalidate pinned TorchArrays and backing wp.array buffers when _create_simulation_bindings() runs again after sim reset. Same pattern as ArticulationData.
…lidations 1. Register PHYSICS_READY callback in Newton RigidObjectCollection so _create_simulation_bindings() runs after sim reset. Without this, pinned TorchArrays hold stale pointers. 2. Add missing sliced property invalidations in ArticulationData rebind block: body_com_lin_acc_w, body_com_ang_acc_w, body_com_quat_b.
Mirror Newton's zero-overhead pattern for direct view-backed properties. Pin TorchArrays, eliminate TimestampedBuffer overhead for stable PhysX view getters.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Follow-up
New commits (f0f4c82→15d8fe2a): adds missing invalidations in articulation_data.py and registers a PHYSICS_READY callback in rigid_object_collection.py. The articulation_data fix looks complete — all lazy-sliced buffers are now invalidated. One new issue found in the collection callback registration.
| lambda _: self._data._create_simulation_bindings(), | ||
| PhysicsEvent.PHYSICS_READY, | ||
| name=f"rigid_object_collection_rebind_{self.cfg.rigid_objects}", | ||
| ) |
There was a problem hiding this comment.
🟡 Warning: _physics_ready_handle not deregistered on cleanup
This registers a PHYSICS_READY callback but the class does not override _clear_callbacks() to deregister it. The base AssetBase._clear_callbacks() only cleans up _initialize_handle, _invalidate_initialize_handle, _prim_deletion_handle, and _debug_vis_handle — it has no knowledge of this new handle.
The sibling class Articulation (articulation.py L3223-3228) correctly overrides _clear_callbacks() for exactly this reason:
def _clear_callbacks(self) -> None:
super()._clear_callbacks()
if hasattr(self, "_physics_ready_handle") and self._physics_ready_handle is not None:
self._physics_ready_handle.deregister()
self._physics_ready_handle = NoneWithout this, when the rigid object collection is destroyed (e.g., scene teardown), the callback will persist and fire into a dead self._data reference, likely causing a crash or silent corruption on the next physics reset.
Four tasks: add _create_simulation_bindings, simplify properties with pinned TorchArrays, register PHYSICS_READY callback, validate.
|
Might be interesting to also add the automatic unwarp for warp kernels as well? Something like below. Might needs some solid tests (cuda vs cpu kernels, make sure they get passed in, no functional or performance difference between passing array vs array.warp. |
Add __cuda_array_interface__ and __array_interface__ properties to TorchArray, delegating to the underlying wp.array. This allows TorchArray instances to be passed directly to wp.launch() without explicitly accessing .warp, since warp's pack_arg routine uses these protocols to build array descriptors. Remove all .warp accessors from wp.launch() inputs/outputs across Newton asset and data files. Retained .warp where the actual wp.array object is needed (reshape calls, utility functions that access .ptr/.is_contiguous, wp.zeros_like).
| @@ -787,10 +788,10 @@ def joint_pos(self) -> wp.array: | |||
| # read data from simulation and set the buffer data and timestamp | |||
| self._joint_pos.data = self._root_view.get_dof_positions() | |||
| self._joint_pos.timestamp = self._sim_timestamp | |||
| return self._joint_pos.data | |||
| return TorchArray(self._joint_pos.data) | |||
There was a problem hiding this comment.
Profiling the physx side, it doesn't appear the caching in working as intended.
All these TorchArray constructors are causing wp.to_torch to be called on every call to these functions since init sets self._torch_cache: to None
Including the overhead of TorchArray it's actually a slight bit slower.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there