Skip to content

Functional/Non-Functional C++ Tile Evaluation #9038

@fbusato

Description

@fbusato

Functional/Non-Functional C++ Tile Evaluation

Tile-based GPU programming is a promising direction for both providing performance and maintainable code. As a new technology, it is fundamental to explore and evaluate capabilities and limitations in the context of CCCL algorithms.

In particular, this issue lists several functional and non-functional properties that should be covered to get a robust understanding of the new programming approach. This step is critical to understand future directions and development effort.

We will focus on the following routine for the exploration:

Functional

  • Correctness

    • What (current) cuTile limitations prevent algorithm implementation?
    • Numerical behavior compared to CUDA C++ implementation.
    • Could GPU arch-dependencies or constrains impact correctness?
  • API surface

    • What new APIs are introduced? and they could be coupled together with existing one? (are they stable, experimental, internal?)
    • Which input sizes, ranks, layouts, data types, execution spaces, or iterator/category are supported?
  • Interoperability

    • Interaction with existing CCCL components: CUB, Thrust, libcudacxx, CMake targets, tests, benchmarks.

Non-functional properties

Implementation Complexity

  • Is Tile documented enough?
  • Does the Tile implementation reduce or increase the amount work?
  • Are diagnostics good enough to understand the problem?
  • How well it support direct or automatic tuning?

Maintainability

  • Are implementation details isolated from public APIs?
  • What could be enforced in Debug mode?

Performance

  • Is Tile mature enough to compete with C++ implementation in terms of perfomance metrics? (do they have the same distribution properties, see NVBench)
  • Launch overhead? Impact of tuning?
  • Compilation intermediate steps inspection, e.g. IR

Ecosystem

  • How current tools Nsight Compute/Systems, nvdisasm, cuobjdump interact with Tile?

Compile time

  • How Tile compared to C++ implementation?
  • Linking time?

Binary size

  • Any changes in final or intermediate artifacts, e.g object files, fatbin

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions