cudaPackages: introduce and use cudaLib by ConnorBaker · Pull Request #406531 · NixOS/nixpkgs

ConnorBaker · 2025-05-12T18:30:13Z

Broadly, this PR does these things:

Introduces a new top-level attribute, _cuda, which consolidates CUDA package set backbone (static data, configurations, utility functions, etc. required to create a CUDA package set)
Removes things from the CUDA package set fixed point which should be supplied by callPackage and the newScope used to construct the package set (i.e., _cuda.bootstrapData, _cuda.fixups)
Aligns language regarding redistributables: redist arch has been changed to redist system to better mirror nix system, as both are two-tuple system configs
Rewrites cudaPackages.backendStdenv

This PR is a step toward landing more improvements from https://github.com/ConnorBaker/cuda-packages and merging the changes in #406740.

This PR presents largely organizational changes, but does cause rebuilds due to changes in the default selection of CUDA capabilities (see the section under _cuda.lib).

The _cuda attribute and all that it contains is introduced to enable re-use out of tree. For the time being, it is provided without any guarantee of stability.

`_cuda.fixups`

fixups is moved to _cuda.fixups
_cuda.fixups is removed from CUDA package sets and exists only at top-level to make clear it exists outside of any CUDA package set's fixed point

`_cuda.bootstrapData`

GPU capabilities are rewritten to include more information, updated with new architectures and feature sets, and brought under _cuda.bootstrapData.cudaCapabilityToInfo (pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix)
NVCC compatibilities are rewritten to group versions for host compilers under an attribute set rather than using a string prefix and brought under _cuda.bootstrapData.nvccCompatibilities (pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix)
Additional commonly used data are available (e.g., redistNames, redistUrlPrefix)

`_cuda.lib`

Constructing out-of-tree package sets for CUDA packages is difficult and typically requires vendoring large numbers of files to be able to benefit from the machinery we (the CUDA team) have built. _cuda.lib provides an interface through which out-of-tree consumers of Nixpkgs can access the same machinery we use. (Data and configuration re-use are enabled by _cuda.bootstrapData and _cuda.fixups.)

_cuda.lib also serves to centralize and document a number of ad-hoc functionalities created in the CUDA package set. For example, many of the functions which had been lumped into flags.nix are now available in _cuda.lib.

Due to its recency, very little of _cuda.lib is provided as a stable API. Many functions are only used internally at the moment (they are prefixed with an _) and are provided without guarantees of stability or correctness. They are however, available for use given their utility.

Changes to the default set of CUDA capabilities

Previously, the set of default capabilities for CUDA 12.8 were

6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 9.0a 10.0 12.0

They are now

            7.5 8.0 8.6 8.9 9.0      10.0 12.0

Given the number of capabilities introduced by the last few releases, dropping older capabilities becomes a necessity to keep the amount of device code generated small enough to link against. (There is a 2GB limit.) Since recent TensorRT releases have deprecated and no dropped support for 6.0 through 7.0, it makes sense to drop them from the default set of capabilities.

Additionally, the inclusion of architecture-specific capabilities, like 9.0a, have been removed from the default collection of targets, as they are not meant to be used with other capabilities without careful guarding in source (a feature only discussed in 12.9¹).

`cudaPackages.backendStdenv`

The changes made to backendStdenv enable evaluation-time checking of certain common mistakes. Additionally, it provides a single point of access to common attributes describing the CUDA package set, such as:

nvccHostCCMatchesStdenvCC: whether the current stdenv matches the version which will be provided to NVCC
hostNixSystem: an alias to stdenv.hostPlatform.system
hostRedistSystem: the NVIDIA equivalent of hostNixSystem
cudaCapabilities: the resolved collection of CUDA capabilities cudaPackages.flags's attributes target
hasJetsonCudaCapability: whether cudaCapabilities contains a capability belonging to a Jetson device

Things done

Add a 👍 reaction to pull requests you find important.

https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/#architecture-specific_features ↩

ConnorBaker · 2025-05-12T19:44:44Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 406531 --extra-nixpkgs-config '{ allowAliases = false; allowBroken = false; allowUnfree = true; checkMeta = true; contentAddressedByDefault = false; cudaCapabilities = [ "8.9" ]; cudaSupport = true; }'

`x86_64-linux`

✅ 1 package built:

nixpkgs-manual

ConnorBaker · 2025-05-12T19:44:52Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 406531

`x86_64-darwin`

✅ 1 package built:

nixpkgs-manual

`aarch64-darwin`

✅ 1 package built:

nixpkgs-manual

ConnorBaker · 2025-05-12T20:11:58Z

Changes:

rebased on master after tree-wide: cudaPackages.cudaFlags -> cudaPackages.flags #406545 was merged
added cudaLib.utils.allowUnfreeCudaPredicate
added commit for tree-wide changes to update usage of cudaPackages.flags

SomeoneSerge · 2025-05-12T20:21:09Z

Here - instead of exposing cudaLib I see something like makeCudaScope = let cudaLib = ...; in cudaLib.${...}; cudaPackages_XX_YY = makeCudaScope { ... }

I would like to have cudaLib exposed outside of the CUDA package set since it does not depend on the CUDA version or architecture. Having it at the top level is also important because if out-of-tree consumers modify it (e.g., by adding new GPUs), those changes are propagated to all consumers of cudaLib. One such example where that's important is the pkgsCuda attribute set I'd like to introduce in #406568: eb044ff.

Following up on the discussions we've had asynchronously:

Out-of-tree support as such requires two public pieces of information:

(extendible) static data, and

the package set constructor (Data -> Configuration -> PackageSet,

where Configuration is version/tag constraints but may include extra extensions, extra fixups, etc.). Implicit here are the "fixups" and "shims", hidden inside the "constructor" (keep this way?). This PR gathers the data we've used so far into data.nix. tests.cuda.db: init #406740 is to unify data.nix with the machine-readable part of the "knowledge". We're yet to sketch the "constructor".

DX-wise we do want to establish a few fixpoints to facilitate propagation of overrides to the multiple ps instances used by Nixpkgs: a fixpoint for extensions, for fixups, for "static data". We do want to keep them to the minimum though, so as not to commit to supporting a wider interface. Connor you mentioned you had a PR for cudaPackagesExtensions open?

In the current form it's a bit backward that cudaLib is defined outside cudaPackages but is only used inside. In the formalism established above, the motivation is obscured by the fact we haven't isolated the "constructor" yet.

I'd also potentially abstain from exposing anything that doesn't need to be a fixpoint

Move _cuda/db to its final location, aligned with the _cuda.lib PR (NixOS#406531), introduce the top-level _cuda fix-point

SomeoneSerge

@ConnorBaker I didn't want to overwrite your commits, leaving the autosquash to you. IMO this is ready to be merged to master, but let's have another look-over and maybe a nixpkgs-review before backporting

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

ConnorBaker · 2025-05-27T15:23:17Z

Rebased, squashed, force-pushed, and updated the PR description. Also built PyTorch to confirm it works on x86_64-linux.

@SomeoneSerge ready for merge!

nixpkgs-ci · 2025-05-27T15:38:48Z

Backport failed for release-24.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-24.11
git worktree add -d .worktree/backport-406531-to-release-24.11 origin/release-24.11
cd .worktree/backport-406531-to-release-24.11
git switch --create backport-406531-to-release-24.11
git cherry-pick -x 646bebe3be8004e578842745136b8196f4f1fced a018d736978adf55e1b8f7bf79e736cd6d042573 0ac3a73b6a997ea5e05067577a17c613ecddcd7d 629ae4e42c4764f1e56cd64746a3f19e28ff62a0 c5dad2886a5623fc5e41054ab9ed9ff8e5f7ac91 765529dfff5cb04cd8ebf9275c9bccc9473fcbb5 ead65813623f92f8630811b1b3616877a727b1d9 8fcff2390e3224e970291975cedcbd23f743c6da 688e14d21a38135270544bfdfcc793d25dea2802

nixpkgs-ci · 2025-05-27T15:39:04Z

Successfully created backport PR for release-25.05:

[Backport release-25.05] cudaPackages: introduce and use cudaLib #411445

SomeoneSerge · 2025-05-27T22:53:11Z

+    ) passthruExtra.supportedCudaCapabilities;
+
+    # The resolved requested or default CUDA capabilities.
+    cudaCapabilities =


Ah, didn't realize that this made flags depend on the package set

Adds `tests.cuda.db`, `tests.cuda.db.html`, and `_cuda.db`. The commit obtained certain mass moves after rebasing on NixOS#406531. The original motivation behind using evalModules in cudaPackages was to ensure that the Data used to generate the package set can be evaluated (& inspected) in its entirety. This focus was lost in implementation Various stylistic and design choices: - Use of all-singular names. - Predominantly column-oriented layout (structure of arrays rather than array of structures); exceptions have to do with rebasings and compatibility. - Column validation: tried domains assertions vs. submodule with dynamic option sets; chose the former to get the more concise <name> placeholders in the nixosOptionsDoc output. - Between Maybe T and Option<T> ended up choosing the former (consistent with Connor's code). - Instead of AttrSet and attrsOf, chose to use infix notation (`String ⇒ T) with the bold arrow hinting at a "memoized" map (cf. dex-lang). Performance considerations: - Adding FOD info to the schema originally contributed extra 1.5s to evaluation time because of `imports` (`mkMerge`) abuse: `json.nix` generates many small definitions, most of which are redundant are are retained in `options.<path>.definitions`. - In the worst case, at the time of writing, unoptimized `json.nix` results in 4s evaluation time when used with the full set of manifests, including the new backported manifests which doubles the number of FODs and versions compared to Nixpkgs. - For this reason we do not directly use `json.nix` in Nixpkgs, but we export and check in the output of `evalModules` (cf. _rawManifests in the next PRs), bringing down the evaluation time to hundreds of miliseconds. It's obviously desirable to optimize this further, but at least this doesn't make things worse. On checking-in vendored content. The output of evalModules is only smaller (measured in lines of code) than the total size of upstream manifests. Most of the reduction comes from deduplicating the "feature manifests".

github-actions Bot added 6.topic: cuda Parallel computing platform and API 8.has: documentation This PR adds or changes documentation labels May 12, 2025

github-project-automation Bot added this to CUDA Team May 12, 2025

github-project-automation Bot moved this to New in CUDA Team May 12, 2025

ConnorBaker mentioned this pull request May 12, 2025

cudaPackages: introduce cudaLib and switch from backendStdenv to cudaStdenv #405751

Closed

13 tasks

ConnorBaker moved this from New to 🏗 In progress in CUDA Team May 12, 2025

github-actions Bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels May 12, 2025

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 34165ab to 6a17ffd Compare May 12, 2025 19:23

ConnorBaker self-assigned this May 12, 2025

ConnorBaker marked this pull request as ready for review May 12, 2025 19:45

ConnorBaker moved this from 🏗 In progress to 👀 Awaits reviews in CUDA Team May 12, 2025

ConnorBaker requested a review from SomeoneSerge May 12, 2025 19:46

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 6a17ffd to bc25080 Compare May 12, 2025 19:59

ofborg Bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label May 12, 2025

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from bc25080 to 856b135 Compare May 12, 2025 20:10

github-actions Bot added the 6.topic: python Python is a high-level, general-purpose programming language. label May 12, 2025

ofborg Bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label May 12, 2025

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 856b135 to fac2fef Compare May 12, 2025 20:15

SomeoneSerge reviewed May 12, 2025

View reviewed changes

nix-owners Bot requested review from abysssol, dit7ya, elohmeier, lebastr and prusnak May 12, 2025 20:23

ConnorBaker mentioned this pull request May 12, 2025

{_cuda.extensions,pkgsCuda,pkgsForCudaArch}: init #406568

Merged

13 tasks

ConnorBaker requested a review from SomeoneSerge May 12, 2025 21:12

SomeoneSerge force-pushed the feat/cuda-packages-uses-cuda-lib branch from 84136e0 to 24a204e Compare May 22, 2025 23:31

SomeoneSerge added a commit to SomeoneSerge/nixpkgs that referenced this pull request May 23, 2025

_cuda.db: init (mv from tests.cuda.db)

c74c966

Move _cuda/db to its final location, aligned with the _cuda.lib PR (NixOS#406531), introduce the top-level _cuda fix-point

SomeoneSerge approved these changes May 26, 2025

View reviewed changes

wegank added the 12.approvals: 1 This PR was reviewed and approved by one person. label May 27, 2025

ConnorBaker added 6 commits May 27, 2025 15:02

cudaPackages: add cudaNamePrefix

646bebe

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaPackages.driver_assistant: mark as unsupported

a018d73

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaLib: init

0ac3a73

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaPackages: rewrite backendStdenv

629ae4e

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaPackages: switch to cudaLib

c5dad28

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaPackages.fixups -> pkgs.cudaFixups

765529d

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from f94c476 to 6b16bcc Compare May 27, 2025 15:05

ConnorBaker added 3 commits May 27, 2025 15:05

tree-wide: cudaPackages.flags updates

ead6581

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

cudaPackages: doc fixup

8fcff23

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

_cuda: introduce to organize CUDA package set backbone

688e14d

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>

ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 6b16bcc to 688e14d Compare May 27, 2025 15:05

ConnorBaker moved this from 👀 Awaits reviews to 🔖 Awaits the merge in CUDA Team May 27, 2025

SomeoneSerge added the backport release-24.11 label May 27, 2025

SomeoneSerge merged commit 050bbae into NixOS:master May 27, 2025
17 of 20 checks passed

github-project-automation Bot moved this from 🔖 Awaits the merge to ✅ Done in CUDA Team May 27, 2025

nixpkgs-ci Bot mentioned this pull request May 27, 2025

[Backport release-25.05] cudaPackages: introduce and use cudaLib #411445

Merged

1 task

github-actions Bot added the 8.has: port to stable This PR already has a backport to the stable release. label May 27, 2025

SomeoneSerge reviewed May 27, 2025

View reviewed changes

ConnorBaker mentioned this pull request May 27, 2025

_cuda: missed fixups #411574

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cudaPackages: introduce and use cudaLib#406531

cudaPackages: introduce and use cudaLib#406531
SomeoneSerge merged 9 commits intoNixOS:masterfrom
ConnorBaker:feat/cuda-packages-uses-cuda-lib

ConnorBaker commented May 12, 2025 •

edited

Loading

Uh oh!

ConnorBaker commented May 12, 2025

Uh oh!

ConnorBaker commented May 12, 2025

Uh oh!

ConnorBaker commented May 12, 2025

Uh oh!

SomeoneSerge May 12, 2025

Uh oh!

ConnorBaker May 12, 2025

Uh oh!

SomeoneSerge May 13, 2025

Uh oh!

SomeoneSerge left a comment

Uh oh!

ConnorBaker commented May 27, 2025

Uh oh!

Uh oh!

nixpkgs-ci Bot commented May 27, 2025

Uh oh!

nixpkgs-ci Bot commented May 27, 2025

Uh oh!

SomeoneSerge May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ConnorBaker commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

_cuda.fixups

_cuda.bootstrapData

_cuda.lib

Changes to the default set of CUDA capabilities

cudaPackages.backendStdenv

Things done

Footnotes

Uh oh!

ConnorBaker commented May 12, 2025

nixpkgs-review result

x86_64-linux

Uh oh!

ConnorBaker commented May 12, 2025

nixpkgs-review result

x86_64-darwin

aarch64-darwin

Uh oh!

ConnorBaker commented May 12, 2025

Uh oh!

SomeoneSerge May 12, 2025

Choose a reason for hiding this comment

Uh oh!

ConnorBaker May 12, 2025

Choose a reason for hiding this comment

Uh oh!

SomeoneSerge May 13, 2025

Choose a reason for hiding this comment

Uh oh!

SomeoneSerge left a comment

Choose a reason for hiding this comment

Uh oh!

ConnorBaker commented May 27, 2025

Uh oh!

Uh oh!

nixpkgs-ci Bot commented May 27, 2025

Uh oh!

nixpkgs-ci Bot commented May 27, 2025

Uh oh!

SomeoneSerge May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ConnorBaker commented May 12, 2025 •

edited

Loading

`_cuda.fixups`

`_cuda.bootstrapData`

`_cuda.lib`

`cudaPackages.backendStdenv`

`nixpkgs-review` result

`x86_64-linux`

`nixpkgs-review` result

`x86_64-darwin`

`aarch64-darwin`