Enhanced support for memory-tracking raft resources#3004
Conversation
| // NVCC injects __host__ __device__ on std::shared_ptr special members, | ||
| // which makes the *implicit* or *defaulted* special members __host__ | ||
| // __device__ too. That conflicts with Upstream types whose special | ||
| // members are __host__ only (e.g. rmm::device_async_resource_ref). | ||
| // User-defined bodies (not = default) force plain __host__ execution space. |
There was a problem hiding this comment.
Good find. Are there changes you would suggest for RMM or CCCL?
There was a problem hiding this comment.
Thanks! I think, CCCL/rrm are fine, because the cuda::mr::shared_resource defines the copy/move constructors explicitly.
| @@ -0,0 +1,237 @@ | |||
| /* | |||
There was a problem hiding this comment.
Why wouldn't we put this in raft/core/resources instead of raft/util? It would be nice to put all of these in the same place, given RAFT Is more than just memory resources. That wy users can find them easily. I think we should do the same with the above instead of putting them in mr.
There was a problem hiding this comment.
I follow the example of the memory_tracking_resources here. But that was introduced recentely too (26.04), so we can change both not risking breaking things too much.
Just to clarify the logic of my choice for these two: I've put them into util folder, because they are not a part of a "normal" algorithm flow but rather utilities to analyize/profile the memory-related resource usage. It's not a strong prerefence though. Would like me to move them both to the core folder (and mark the PR as breaking)?
NB the current state of raft (main + PR):
raft/core/resources/- individual resources, such as cuda stream, cublas, memory resourcesraft/core/- raft::resources itself (+memory_tracking_resources, memory_stats_resources, dry_run_resources if we decide so)raft/mr/- CCCL/rmm compatible memory resourcesraft/util/- current of memory_tracking_resources, memory_stats_resources, dry_run_resources
There was a problem hiding this comment.
The layout makes sense once you describe the reasoning, but I just fear people aren't going to be able to find them in these locations because the logic behind them isn't immediately obvious. I agree w/ using mr namespace for the rmm compatible resources (vs the raft::resources), but I think these resources might be easier to find if we also put them in mr. Maybe we could do a combination of both and put them in raft/core/resource/util?
There was a problem hiding this comment.
Hmm, the unfortunate naming confuses me again :) By "these resources" you mean raft::memory_tracking_resources, raft::memory_stats_resources, raft::dry_run_resources? I think, it's reasonable to put them in raft/core/resource/util, but still not ideal.
These three are not "resources" like the one can find in raft/core/resource folder; they are flavors of raft::resources handle type itself. Hence I was considering to either put them in raft/util or in raft/core (alongside raft::resources handle), but the latter is a bit overcrowded already.
I also opened a separate PR to expose raft::memory_tracking_resources rapidsai/cuvs#2073 . It supports the above positioning logic: in other languages, we don't need to distinguish between the handle types - we just provide another constructor function to create the new utility resource handle in lieu of the original resources handle.
There was a problem hiding this comment.
Since memory_stats_resources is a subclass of raft::resources It makes sense to me to keep it in raft/core where resources.hpp is located. But we have many other headers there which makes it harder to discover the new utilities. To me placing a raft::resources object into the mr folder would be somewhat confusing.
tfeher
left a comment
There was a problem hiding this comment.
Thanks Artem for the PR! Overall looks good, I have only one question regarding Thrust policy handling.
| } | ||
|
|
||
| // --- Device (global) --- | ||
| // Invalidate the cached thrust policy (the resource_ref it captured |
There was a problem hiding this comment.
Do we need to invalidate this again during destruction of the memory_stats_resources?
There was a problem hiding this comment.
No, unlike the device memory resource, the thrust policy is a local resource object, so it dies out with its owning raft::resources handle.
| @@ -0,0 +1,237 @@ | |||
| /* | |||
There was a problem hiding this comment.
Since memory_stats_resources is a subclass of raft::resources It makes sense to me to keep it in raft/core where resources.hpp is located. But we have many other headers there which makes it harder to discover the new utilities. To me placing a raft::resources object into the mr folder would be somewhat confusing.
aa09273 to
053b0c5
Compare
statistics_adaptor.hppto compile via nvcc (see code comment)memory_stats_resources. In constrast tomemory_tracking_resourcesthis doesn't stream the memory usage online, but only reports the overall usage metrics.