Skip to content

Add sub-RFC for interleaving memory allocation#2032

Open
Alexandr-Konovalov wants to merge 7 commits intomasterfrom
dev/Alexandr-Konovalov/rfc-interleaving
Open

Add sub-RFC for interleaving memory allocation#2032
Alexandr-Konovalov wants to merge 7 commits intomasterfrom
dev/Alexandr-Konovalov/rfc-interleaving

Conversation

@Alexandr-Konovalov
Copy link
Copy Markdown
Contributor

No description provided.

@Alexandr-Konovalov Alexandr-Konovalov marked this pull request as ready for review March 31, 2026 16:08
@Alexandr-Konovalov Alexandr-Konovalov changed the title Add sub-RFC for interleaving memory allocation. Add sub-RFC for interleaving memory allocation Apr 1, 2026

There are two kinds of NUMA-related performance bottlenecks: latency increasing due to
access to a remote node and bandwidth-limited simultaneous access from different CPUs to
a single NUMA memory node. A well-known method to mitigate both is random distribution of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a single NUMA memory node. A well-known method to mitigate both is random distribution of
a single NUMA memory node. A well-known method to mitigate both is controlled placement of

The examples you provide in the next sentence are not "random". So perhaps you mean that you need to do something active to ensure memory placement in done in some thoughtful way. Not sure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add more details hier.

vossmjp
vossmjp previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +32 to +34
not defined, we can't use it to construct e.g., a bit mask. Allocation that is unbalanced
between NUMA nodes doesn't seem to have useful applications, so repeated elements in `list
of nodes` is an error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are unsure regarding repeated numa nodes having useful applications, may be we shouldn't strictly prohibit having them in a vector? If its support will not complicate the implementation, then I would suggest not prohibiting such a use case.
I would also add an open question regarding the use cases for which prohibiting could be beneficial.

Copy link
Copy Markdown
Contributor

@akukanov akukanov Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand, starting with more restrictive policies allows to relax it later, while restricting a policy is usually a breaking change. But on the other hand, it seems that enforcing the restriction would require some special effort.

If the purpose of this clause is to avoid dealing with potential implementation issues due to repeating elements, we can say that the behavior is undefined in that case. It will allow the implementation to assume that there are no repeating elements, without specifically checking for that and reporting an error. We might still want to do the checks in the debug mode (especially if we are aware of the related issues), but that's an implementation quality question.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. Both Microsoft and libnuma have bitmasks here, and so no problems with repeated nodes. Somehow tbb::numa_nodes() returns vector, and we should think about semantics of repeated elements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are unsure regarding repeated numa nodes having useful applications, may be we shouldn't strictly prohibit having them in a vector? If its support will not complicate the implementation, then I would suggest not prohibiting such a use case. I would also add an open question regarding the use cases for which prohibiting could be beneficial.

Good point. I'll add the open question.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Microsoft and libnuma have bitmasks here, and so no problems with repeated nodes.

I think that depends on the meaning of the masks and the use of the system APIs. For example, the list {0, 1, 0, 2} seems to indicate that every second chunk should go to the node 0, not just that the node 0 is in the set like the other two nodes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly such semantics is expected. And [0, 1, 0, 2] is list, not set.

I think bitmasks always mean compact representation of sets. In case of libnuma that's

A struct bitmask controls a bit map of arbitrary length containing a bit representation of nodes.

And from Microsoft

A processor mask is a bit vector in which each bit represents a processor and whether it is in the node.

Comment on lines +43 to +47
```c++
void *tbb::numa::alloc_interleaved(size_t size, size_t interleaving_step = 0,
const std::vector<tbb::numa_node_id> *nodes = nullptr);
void tbb::numa::free_interleaved(void *ptr, size_t size);
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API may need further elaboration, One question is below, but I have more of those. At the same time, I do not want to postpone acceptance of the general proposal.

Do we prefer to accept the RFC now and work on the API in subsequent RFC updates? In that case, maybe it is better to either remove the specific APIs or at least mention that these are conceptual, rather than final.

Comment on lines +29 to +31
`list of nodes for allocation` is conceptually a set of `tbb::numa_node_id`. However,
because `tbb::numa_nodes()` returns `std::vector` and creating a `std::set` from it
requires allocation, `vector` can be used. Because semantics of `tbb::numa_node_id` is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I would use std::span but it is not available till C++20.
But taking a pointer to a const vector also looks strange in the API, while its only purpose is to support the default argument. Should we instead use a C-style "pointer and size" semantics? Or have two overloads, with and without the vector?

Copy link
Copy Markdown
Contributor Author

@Alexandr-Konovalov Alexandr-Konovalov Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, just-happens-to-be-standard const std::vector<> & for empty vector requires comparison with zero after return from alloc_interleaved(). And I found no arguments against version with pointer rather then customer hates to write & in each non-default call.

Yet another overload looks like less evil wrt "pointer and size". But we have 2 parameters with default values. I;m not sure which one is "more default".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants