Add sub-RFC for interleaving memory allocation#2032
Add sub-RFC for interleaving memory allocation#2032Alexandr-Konovalov wants to merge 7 commits intomasterfrom
Conversation
|
|
||
| There are two kinds of NUMA-related performance bottlenecks: latency increasing due to | ||
| access to a remote node and bandwidth-limited simultaneous access from different CPUs to | ||
| a single NUMA memory node. A well-known method to mitigate both is random distribution of |
There was a problem hiding this comment.
| a single NUMA memory node. A well-known method to mitigate both is random distribution of | |
| a single NUMA memory node. A well-known method to mitigate both is controlled placement of |
The examples you provide in the next sentence are not "random". So perhaps you mean that you need to do something active to ensure memory placement in done in some thoughtful way. Not sure.
There was a problem hiding this comment.
I tried to add more details hier.
Co-authored-by: Mike Voss <[email protected]>
Co-authored-by: Mike Voss <[email protected]>
Co-authored-by: Mike Voss <[email protected]>
Co-authored-by: Mike Voss <[email protected]>
| not defined, we can't use it to construct e.g., a bit mask. Allocation that is unbalanced | ||
| between NUMA nodes doesn't seem to have useful applications, so repeated elements in `list | ||
| of nodes` is an error. |
There was a problem hiding this comment.
Since we are unsure regarding repeated numa nodes having useful applications, may be we shouldn't strictly prohibit having them in a vector? If its support will not complicate the implementation, then I would suggest not prohibiting such a use case.
I would also add an open question regarding the use cases for which prohibiting could be beneficial.
There was a problem hiding this comment.
On the one hand, starting with more restrictive policies allows to relax it later, while restricting a policy is usually a breaking change. But on the other hand, it seems that enforcing the restriction would require some special effort.
If the purpose of this clause is to avoid dealing with potential implementation issues due to repeating elements, we can say that the behavior is undefined in that case. It will allow the implementation to assume that there are no repeating elements, without specifically checking for that and reporting an error. We might still want to do the checks in the debug mode (especially if we are aware of the related issues), but that's an implementation quality question.
There was a problem hiding this comment.
That's interesting. Both Microsoft and libnuma have bitmasks here, and so no problems with repeated nodes. Somehow tbb::numa_nodes() returns vector, and we should think about semantics of repeated elements.
There was a problem hiding this comment.
Since we are unsure regarding repeated numa nodes having useful applications, may be we shouldn't strictly prohibit having them in a vector? If its support will not complicate the implementation, then I would suggest not prohibiting such a use case. I would also add an open question regarding the use cases for which prohibiting could be beneficial.
Good point. I'll add the open question.
There was a problem hiding this comment.
Both Microsoft and libnuma have bitmasks here, and so no problems with repeated nodes.
I think that depends on the meaning of the masks and the use of the system APIs. For example, the list {0, 1, 0, 2} seems to indicate that every second chunk should go to the node 0, not just that the node 0 is in the set like the other two nodes.
There was a problem hiding this comment.
Yes, exactly such semantics is expected. And [0, 1, 0, 2] is list, not set.
I think bitmasks always mean compact representation of sets. In case of libnuma that's
A struct bitmask controls a bit map of arbitrary length containing a bit representation of nodes.
And from Microsoft
A processor mask is a bit vector in which each bit represents a processor and whether it is in the node.
| ```c++ | ||
| void *tbb::numa::alloc_interleaved(size_t size, size_t interleaving_step = 0, | ||
| const std::vector<tbb::numa_node_id> *nodes = nullptr); | ||
| void tbb::numa::free_interleaved(void *ptr, size_t size); | ||
| ``` |
There was a problem hiding this comment.
The API may need further elaboration, One question is below, but I have more of those. At the same time, I do not want to postpone acceptance of the general proposal.
Do we prefer to accept the RFC now and work on the API in subsequent RFC updates? In that case, maybe it is better to either remove the specific APIs or at least mention that these are conceptual, rather than final.
| `list of nodes for allocation` is conceptually a set of `tbb::numa_node_id`. However, | ||
| because `tbb::numa_nodes()` returns `std::vector` and creating a `std::set` from it | ||
| requires allocation, `vector` can be used. Because semantics of `tbb::numa_node_id` is |
There was a problem hiding this comment.
Ideally, I would use std::span but it is not available till C++20.
But taking a pointer to a const vector also looks strange in the API, while its only purpose is to support the default argument. Should we instead use a C-style "pointer and size" semantics? Or have two overloads, with and without the vector?
There was a problem hiding this comment.
Sadly, just-happens-to-be-standard const std::vector<> & for empty vector requires comparison with zero after return from alloc_interleaved(). And I found no arguments against version with pointer rather then customer hates to write & in each non-default call.
Yet another overload looks like less evil wrt "pointer and size". But we have 2 parameters with default values. I;m not sure which one is "more default".
No description provided.