Skip to content

Enhanced support for memory-tracking raft resources#3004

Open
achirkin wants to merge 6 commits into
rapidsai:release/26.06from
achirkin:enh-memory-resources
Open

Enhanced support for memory-tracking raft resources#3004
achirkin wants to merge 6 commits into
rapidsai:release/26.06from
achirkin:enh-memory-resources

Conversation

@achirkin
Copy link
Copy Markdown
Contributor

  1. Change the host memory resource to have the same owning semantics as the device memory resources as of Migrate RMM usage to CCCL MR design #2996
  2. Add a workaround to statistics_adaptor.hpp to compile via nvcc (see code comment)
  3. Add memory_stats_resources. In constrast to memory_tracking_resources this doesn't stream the memory usage online, but only reports the overall usage metrics.

@achirkin achirkin self-assigned this Apr 24, 2026
@achirkin achirkin requested review from a team as code owners April 24, 2026 14:09
@achirkin achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 24, 2026
@achirkin achirkin moved this to In Progress in Unstructured Data Processing Apr 24, 2026
@achirkin achirkin changed the title Enhanced support for memory tracking raft resources Enhanced support for memory-tracking raft resources Apr 24, 2026
Comment on lines +75 to +79
// NVCC injects __host__ __device__ on std::shared_ptr special members,
// which makes the *implicit* or *defaulted* special members __host__
// __device__ too. That conflicts with Upstream types whose special
// members are __host__ only (e.g. rmm::device_async_resource_ref).
// User-defined bodies (not = default) force plain __host__ execution space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Are there changes you would suggest for RMM or CCCL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think, CCCL/rrm are fine, because the cuda::mr::shared_resource defines the copy/move constructors explicitly.

@@ -0,0 +1,237 @@
/*
Copy link
Copy Markdown
Member

@cjnolet cjnolet May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we put this in raft/core/resources instead of raft/util? It would be nice to put all of these in the same place, given RAFT Is more than just memory resources. That wy users can find them easily. I think we should do the same with the above instead of putting them in mr.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the example of the memory_tracking_resources here. But that was introduced recentely too (26.04), so we can change both not risking breaking things too much.
Just to clarify the logic of my choice for these two: I've put them into util folder, because they are not a part of a "normal" algorithm flow but rather utilities to analyize/profile the memory-related resource usage. It's not a strong prerefence though. Would like me to move them both to the core folder (and mark the PR as breaking)?

NB the current state of raft (main + PR):

  • raft/core/resources/ - individual resources, such as cuda stream, cublas, memory resources
  • raft/core/ - raft::resources itself (+memory_tracking_resources, memory_stats_resources, dry_run_resources if we decide so)
  • raft/mr/ - CCCL/rmm compatible memory resources
  • raft/util/ - current of memory_tracking_resources, memory_stats_resources, dry_run_resources

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout makes sense once you describe the reasoning, but I just fear people aren't going to be able to find them in these locations because the logic behind them isn't immediately obvious. I agree w/ using mr namespace for the rmm compatible resources (vs the raft::resources), but I think these resources might be easier to find if we also put them in mr. Maybe we could do a combination of both and put them in raft/core/resource/util?

Copy link
Copy Markdown
Contributor Author

@achirkin achirkin May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the unfortunate naming confuses me again :) By "these resources" you mean raft::memory_tracking_resources, raft::memory_stats_resources, raft::dry_run_resources? I think, it's reasonable to put them in raft/core/resource/util, but still not ideal.
These three are not "resources" like the one can find in raft/core/resource folder; they are flavors of raft::resources handle type itself. Hence I was considering to either put them in raft/util or in raft/core (alongside raft::resources handle), but the latter is a bit overcrowded already.

I also opened a separate PR to expose raft::memory_tracking_resources rapidsai/cuvs#2073 . It supports the above positioning logic: in other languages, we don't need to distinguish between the handle types - we just provide another constructor function to create the new utility resource handle in lieu of the original resources handle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since memory_stats_resources is a subclass of raft::resources It makes sense to me to keep it in raft/core where resources.hpp is located. But we have many other headers there which makes it harder to discover the new utilities. To me placing a raft::resources object into the mr folder would be somewhat confusing.

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the PR! Overall looks good, I have only one question regarding Thrust policy handling.

}

// --- Device (global) ---
// Invalidate the cached thrust policy (the resource_ref it captured
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to invalidate this again during destruction of the memory_stats_resources?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unlike the device memory resource, the thrust policy is a local resource object, so it dies out with its owning raft::resources handle.

@@ -0,0 +1,237 @@
/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since memory_stats_resources is a subclass of raft::resources It makes sense to me to keep it in raft/core where resources.hpp is located. But we have many other headers there which makes it harder to discover the new utilities. To me placing a raft::resources object into the mr folder would be somewhat confusing.

@achirkin achirkin changed the base branch from main to release/26.06 May 15, 2026 14:33
@achirkin achirkin requested review from a team as code owners May 15, 2026 14:33
@achirkin achirkin requested a review from msarahan May 15, 2026 14:33
@achirkin achirkin force-pushed the enh-memory-resources branch from aa09273 to 053b0c5 Compare May 15, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants