[REVIEW] Add all-to-all, scatter, and scatterv functions to raft::comms_t#3002
[REVIEW] Add all-to-all, scatter, and scatterv functions to raft::comms_t#3002seunghwak wants to merge 15 commits into
Conversation
tarang-jain
left a comment
There was a problem hiding this comment.
Thanks! These seem to be much needed. A few comments from my end.
| stream)); | ||
|
|
||
| communicator.alltoall(temp_d.data(), recv_d.data(), 1, stream); | ||
| communicator.sync_stream(stream); |
There was a problem hiding this comment.
why do we need to sync the communicator here? The operations after this are ordered on the same stream, right?
There was a problem hiding this comment.
Yes, many tests in this file have unnecessary stream synchronizations. Just followed the convention but deleted all of them in this file (excluding the necessary ones after moving data from device to host).
| rmm::device_uvector<int> temp_d(communicator.get_size(), stream); | ||
| rmm::device_uvector<int> recv_d(communicator.get_size(), stream); | ||
|
|
||
| RAFT_CUDA_TRY(cudaMemcpyAsync(temp_d.data(), |
There was a problem hiding this comment.
Please use raft::copy instead of direct calls to the CUDA function
There was a problem hiding this comment.
Thanks, I used cudaMemcpyAsync to be consistent with the rest of the file but we should better replace cudaMemcpyAsync with raft calls in the entire file.
I replaced cudaMemcpyAsync with raft copy calls (raft::update_host & update_device).
| if (communicator.get_rank() == root) { | ||
| std::vector<int> sends(communicator.get_size(), communicator.get_rank()); | ||
| std::fill(sends.begin(), sends.end(), root); | ||
| RAFT_CUDA_TRY(cudaMemcpyAsync( |
There was a problem hiding this comment.
again use raft::copy here
|
|
||
| int temp_h = -1; // Verify more than one byte is being sent | ||
| RAFT_CUDA_TRY( | ||
| cudaMemcpyAsync(&temp_h, recv_d.data(), sizeof(int), cudaMemcpyDeviceToHost, stream)); |
There was a problem hiding this comment.
here as well. raft::copy
| recv_d.size(), | ||
| root, | ||
| stream); | ||
| communicator.sync_stream(stream); |
There was a problem hiding this comment.
Again here -- is there a specific reason why you sync the whole stream after the multi-gpu operation?
There was a problem hiding this comment.
No, just following the convention in this file. No need, deleted all the stream synchronization after communication functions.
ncclAlltoAll,ncclScatter, andncclGatherwas added in NCCL 2.28.3.We previously performed all-to-all using multi-cast and gather was performed using send/receive.
raft::comms_tlacks scatter and scatterv.This PR adds alltoall (
ncclAlltoAllwrapper), scatter (ncclScatterwrapper), and scatterv (implemented using send/receive).This PR also updates gather to directly call
ncclGather(instead of emulating gather using send/receive).