Functional/Non-Functional C++ Tile Evaluation
Tile-based GPU programming is a promising direction for both providing performance and maintainable code. As a new technology, it is fundamental to explore and evaluate capabilities and limitations in the context of CCCL algorithms.
In particular, this issue lists several functional and non-functional properties that should be covered to get a robust understanding of the new programming approach. This step is critical to understand future directions and development effort.
We will focus on the following routine for the exploration:
Functional
-
Correctness
- What (current) cuTile limitations prevent algorithm implementation?
- Numerical behavior compared to CUDA C++ implementation.
- Could GPU arch-dependencies or constrains impact correctness?
-
API surface
- What new APIs are introduced? and they could be coupled together with existing one? (are they stable, experimental, internal?)
- Which input sizes, ranks, layouts, data types, execution spaces, or iterator/category are supported?
-
Interoperability
- Interaction with existing CCCL components: CUB, Thrust, libcudacxx, CMake targets, tests, benchmarks.
Non-functional properties
Implementation Complexity
- Is Tile documented enough?
- Does the Tile implementation reduce or increase the amount work?
- Are diagnostics good enough to understand the problem?
- How well it support direct or automatic tuning?
Maintainability
- Are implementation details isolated from public APIs?
- What could be enforced in Debug mode?
Performance
- Is Tile mature enough to compete with C++ implementation in terms of perfomance metrics? (do they have the same distribution properties, see NVBench)
- Launch overhead? Impact of tuning?
- Compilation intermediate steps inspection, e.g. IR
Ecosystem
- How current tools Nsight Compute/Systems,
nvdisasm, cuobjdump interact with Tile?
Compile time
- How Tile compared to C++ implementation?
- Linking time?
Binary size
- Any changes in final or intermediate artifacts, e.g object files, fatbin
Functional/Non-Functional C++ Tile Evaluation
Tile-based GPU programming is a promising direction for both providing performance and maintainable code. As a new technology, it is fundamental to explore and evaluate capabilities and limitations in the context of CCCL algorithms.
In particular, this issue lists several functional and non-functional properties that should be covered to get a robust understanding of the new programming approach. This step is critical to understand future directions and development effort.
We will focus on the following routine for the exploration:
Functional
Correctness
API surface
Interoperability
Non-functional properties
Implementation Complexity
Maintainability
Performance
Ecosystem
nvdisasm,cuobjdumpinteract with Tile?Compile time
Binary size