Skip to content

cudaPackages: introduce and use cudaLib#406531

Merged
SomeoneSerge merged 9 commits intoNixOS:masterfrom
ConnorBaker:feat/cuda-packages-uses-cuda-lib
May 27, 2025
Merged

cudaPackages: introduce and use cudaLib#406531
SomeoneSerge merged 9 commits intoNixOS:masterfrom
ConnorBaker:feat/cuda-packages-uses-cuda-lib

Conversation

@ConnorBaker
Copy link
Copy Markdown
Contributor

@ConnorBaker ConnorBaker commented May 12, 2025

Broadly, this PR does these things:

  • Introduces a new top-level attribute, _cuda, which consolidates CUDA package set backbone (static data, configurations, utility functions, etc. required to create a CUDA package set)
  • Removes things from the CUDA package set fixed point which should be supplied by callPackage and the newScope used to construct the package set (i.e., _cuda.bootstrapData, _cuda.fixups)
  • Aligns language regarding redistributables: redist arch has been changed to redist system to better mirror nix system, as both are two-tuple system configs
  • Rewrites cudaPackages.backendStdenv

This PR is a step toward landing more improvements from https://github.com/ConnorBaker/cuda-packages and merging the changes in #406740.

This PR presents largely organizational changes, but does cause rebuilds due to changes in the default selection of CUDA capabilities (see the section under _cuda.lib).

The _cuda attribute and all that it contains is introduced to enable re-use out of tree. For the time being, it is provided without any guarantee of stability.

_cuda.fixups

  • fixups is moved to _cuda.fixups
  • _cuda.fixups is removed from CUDA package sets and exists only at top-level to make clear it exists outside of any CUDA package set's fixed point

_cuda.bootstrapData

  • GPU capabilities are rewritten to include more information, updated with new architectures and feature sets, and brought under _cuda.bootstrapData.cudaCapabilityToInfo (pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix)
  • NVCC compatibilities are rewritten to group versions for host compilers under an attribute set rather than using a string prefix and brought under _cuda.bootstrapData.nvccCompatibilities (pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix)
  • Additional commonly used data are available (e.g., redistNames, redistUrlPrefix)

_cuda.lib

Constructing out-of-tree package sets for CUDA packages is difficult and typically requires vendoring large numbers of files to be able to benefit from the machinery we (the CUDA team) have built. _cuda.lib provides an interface through which out-of-tree consumers of Nixpkgs can access the same machinery we use. (Data and configuration re-use are enabled by _cuda.bootstrapData and _cuda.fixups.)

_cuda.lib also serves to centralize and document a number of ad-hoc functionalities created in the CUDA package set. For example, many of the functions which had been lumped into flags.nix are now available in _cuda.lib.

Due to its recency, very little of _cuda.lib is provided as a stable API. Many functions are only used internally at the moment (they are prefixed with an _) and are provided without guarantees of stability or correctness. They are however, available for use given their utility.

Changes to the default set of CUDA capabilities

Previously, the set of default capabilities for CUDA 12.8 were

6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 9.0a 10.0 12.0

They are now

            7.5 8.0 8.6 8.9 9.0      10.0 12.0

Given the number of capabilities introduced by the last few releases, dropping older capabilities becomes a necessity to keep the amount of device code generated small enough to link against. (There is a 2GB limit.) Since recent TensorRT releases have deprecated and no dropped support for 6.0 through 7.0, it makes sense to drop them from the default set of capabilities.

Additionally, the inclusion of architecture-specific capabilities, like 9.0a, have been removed from the default collection of targets, as they are not meant to be used with other capabilities without careful guarding in source (a feature only discussed in 12.91).

cudaPackages.backendStdenv

The changes made to backendStdenv enable evaluation-time checking of certain common mistakes. Additionally, it provides a single point of access to common attributes describing the CUDA package set, such as:

  • nvccHostCCMatchesStdenvCC: whether the current stdenv matches the version which will be provided to NVCC
  • hostNixSystem: an alias to stdenv.hostPlatform.system
  • hostRedistSystem: the NVIDIA equivalent of hostNixSystem
  • cudaCapabilities: the resolved collection of CUDA capabilities cudaPackages.flags's attributes target
  • hasJetsonCudaCapability: whether cudaCapabilities contains a capability belonging to a Jetson device

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Footnotes

  1. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/#architecture-specific_features

@github-actions github-actions Bot added 6.topic: cuda Parallel computing platform and API 8.has: documentation This PR adds or changes documentation labels May 12, 2025
@ConnorBaker ConnorBaker moved this from New to 🏗 In progress in CUDA Team May 12, 2025
@github-actions github-actions Bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels May 12, 2025
@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 34165ab to 6a17ffd Compare May 12, 2025 19:23
@ConnorBaker ConnorBaker self-assigned this May 12, 2025
@ConnorBaker
Copy link
Copy Markdown
Contributor Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 406531 --extra-nixpkgs-config '{ allowAliases = false; allowBroken = false; allowUnfree = true; checkMeta = true; contentAddressedByDefault = false; cudaCapabilities = [ "8.9" ]; cudaSupport = true; }'


x86_64-linux

✅ 1 package built:
  • nixpkgs-manual

@ConnorBaker
Copy link
Copy Markdown
Contributor Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 406531


x86_64-darwin

✅ 1 package built:
  • nixpkgs-manual

aarch64-darwin

✅ 1 package built:
  • nixpkgs-manual

@ConnorBaker ConnorBaker marked this pull request as ready for review May 12, 2025 19:45
@ConnorBaker ConnorBaker moved this from 🏗 In progress to 👀 Awaits reviews in CUDA Team May 12, 2025
@ConnorBaker ConnorBaker requested a review from SomeoneSerge May 12, 2025 19:46
@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 6a17ffd to bc25080 Compare May 12, 2025 19:59
@ofborg ofborg Bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label May 12, 2025
@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from bc25080 to 856b135 Compare May 12, 2025 20:10
@github-actions github-actions Bot added the 6.topic: python Python is a high-level, general-purpose programming language. label May 12, 2025
@ofborg ofborg Bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label May 12, 2025
@ConnorBaker
Copy link
Copy Markdown
Contributor Author

Changes:

@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 856b135 to fac2fef Compare May 12, 2025 20:15
Comment thread pkgs/top-level/all-packages.nix Outdated
Comment on lines 2735 to 2726
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here - instead of exposing cudaLib I see something like makeCudaScope = let cudaLib = ...; in cudaLib.${...}; cudaPackages_XX_YY = makeCudaScope { ... }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have cudaLib exposed outside of the CUDA package set since it does not depend on the CUDA version or architecture. Having it at the top level is also important because if out-of-tree consumers modify it (e.g., by adding new GPUs), those changes are propagated to all consumers of cudaLib. One such example where that's important is the pkgsCuda attribute set I'd like to introduce in #406568: eb044ff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the discussions we've had asynchronously:

  • Out-of-tree support as such requires two public pieces of information:

    • (extendible) static data, and
    • the package set constructor (Data -> Configuration -> PackageSet,

    where Configuration is version/tag constraints but may include extra extensions, extra fixups, etc.). Implicit here are the "fixups" and "shims", hidden inside the "constructor" (keep this way?). This PR gathers the data we've used so far into data.nix. tests.cuda.db: init #406740 is to unify data.nix with the machine-readable part of the "knowledge". We're yet to sketch the "constructor".

  • DX-wise we do want to establish a few fixpoints to facilitate propagation of overrides to the multiple ps instances used by Nixpkgs: a fixpoint for extensions, for fixups, for "static data". We do want to keep them to the minimum though, so as not to commit to supporting a wider interface. Connor you mentioned you had a PR for cudaPackagesExtensions open?

  • In the current form it's a bit backward that cudaLib is defined outside cudaPackages but is only used inside. In the formalism established above, the motivation is obscured by the fact we haven't isolated the "constructor" yet.

  • I'd also potentially abstain from exposing anything that doesn't need to be a fixpoint

@SomeoneSerge SomeoneSerge force-pushed the feat/cuda-packages-uses-cuda-lib branch from 84136e0 to 24a204e Compare May 22, 2025 23:31
SomeoneSerge added a commit to SomeoneSerge/nixpkgs that referenced this pull request May 23, 2025
Move _cuda/db to its final location, aligned with the _cuda.lib PR (NixOS#406531),
introduce the top-level _cuda fix-point
Copy link
Copy Markdown
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorBaker I didn't want to overwrite your commits, leaving the autosquash to you. IMO this is ready to be merged to master, but let's have another look-over and maybe a nixpkgs-review before backporting

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one person. label May 27, 2025
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from f94c476 to 6b16bcc Compare May 27, 2025 15:05
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker force-pushed the feat/cuda-packages-uses-cuda-lib branch from 6b16bcc to 688e14d Compare May 27, 2025 15:05
@ConnorBaker
Copy link
Copy Markdown
Contributor Author

Rebased, squashed, force-pushed, and updated the PR description. Also built PyTorch to confirm it works on x86_64-linux.

@SomeoneSerge ready for merge!

@ConnorBaker ConnorBaker moved this from 👀 Awaits reviews to 🔖 Awaits the merge in CUDA Team May 27, 2025
@SomeoneSerge SomeoneSerge merged commit 050bbae into NixOS:master May 27, 2025
17 of 20 checks passed
@github-project-automation github-project-automation Bot moved this from 🔖 Awaits the merge to ✅ Done in CUDA Team May 27, 2025
@nixpkgs-ci
Copy link
Copy Markdown
Contributor

nixpkgs-ci Bot commented May 27, 2025

Backport failed for release-24.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-24.11
git worktree add -d .worktree/backport-406531-to-release-24.11 origin/release-24.11
cd .worktree/backport-406531-to-release-24.11
git switch --create backport-406531-to-release-24.11
git cherry-pick -x 646bebe3be8004e578842745136b8196f4f1fced a018d736978adf55e1b8f7bf79e736cd6d042573 0ac3a73b6a997ea5e05067577a17c613ecddcd7d 629ae4e42c4764f1e56cd64746a3f19e28ff62a0 c5dad2886a5623fc5e41054ab9ed9ff8e5f7ac91 765529dfff5cb04cd8ebf9275c9bccc9473fcbb5 ead65813623f92f8630811b1b3616877a727b1d9 8fcff2390e3224e970291975cedcbd23f743c6da 688e14d21a38135270544bfdfcc793d25dea2802

@nixpkgs-ci
Copy link
Copy Markdown
Contributor

nixpkgs-ci Bot commented May 27, 2025

Successfully created backport PR for release-25.05:

@github-actions github-actions Bot added the 8.has: port to stable This PR already has a backport to the stable release. label May 27, 2025
) passthruExtra.supportedCudaCapabilities;

# The resolved requested or default CUDA capabilities.
cudaCapabilities =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't realize that this made flags depend on the package set

@ConnorBaker ConnorBaker mentioned this pull request May 27, 2025
13 tasks
SomeoneSerge added a commit to SomeoneSerge/nixpkgs that referenced this pull request May 30, 2025
Adds `tests.cuda.db`, `tests.cuda.db.html`, and `_cuda.db`.

The commit obtained certain mass moves after rebasing on NixOS#406531.

The original motivation behind using evalModules in cudaPackages
was to ensure that the Data used to generate the package set
can be evaluated (& inspected) in its entirety. This focus was lost in
implementation

Various stylistic and design choices:

- Use of all-singular names.
- Predominantly column-oriented layout (structure of arrays rather than array
  of structures); exceptions have to do with rebasings and
  compatibility.
- Column validation: tried domains assertions vs. submodule with dynamic
  option sets; chose the former to get the more concise <name>
  placeholders in the nixosOptionsDoc output.
- Between Maybe T and Option<T> ended up choosing the former (consistent
  with Connor's code).
- Instead of AttrSet and attrsOf, chose to use infix notation
  (`String ⇒ T) with the bold arrow hinting at a "memoized" map (cf.
  dex-lang).

Performance considerations:
- Adding FOD info to the schema originally contributed extra 1.5s to
  evaluation time because of `imports` (`mkMerge`) abuse: `json.nix`
  generates many small definitions, most of which are redundant are are
  retained in `options.<path>.definitions`.
- In the worst case, at the time of writing, unoptimized `json.nix`
  results in 4s evaluation time when used with the full set of
  manifests, including the new backported manifests which doubles the
  number of FODs and versions compared to Nixpkgs.
- For this reason we do not directly use `json.nix` in Nixpkgs, but we
  export and check in the output of `evalModules` (cf. _rawManifests in
  the next PRs), bringing down the evaluation time to hundreds of
  miliseconds. It's obviously desirable to optimize this further, but at
  least this doesn't make things worse.

On checking-in vendored content.
The output of evalModules is only smaller (measured in lines of code)
than the total size of upstream manifests. Most of the reduction comes
from deduplicating the "feature manifests".
SomeoneSerge added a commit to SomeoneSerge/nixpkgs that referenced this pull request Jun 10, 2025
Adds `tests.cuda.db`, `tests.cuda.db.html`, and `_cuda.db`.

The commit obtained certain mass moves after rebasing on NixOS#406531.

The original motivation behind using evalModules in cudaPackages
was to ensure that the Data used to generate the package set
can be evaluated (& inspected) in its entirety. This focus was lost in
implementation

Various stylistic and design choices:

- Use of all-singular names.
- Predominantly column-oriented layout (structure of arrays rather than array
  of structures); exceptions have to do with rebasings and
  compatibility.
- Column validation: tried domains assertions vs. submodule with dynamic
  option sets; chose the former to get the more concise <name>
  placeholders in the nixosOptionsDoc output.
- Between Maybe T and Option<T> ended up choosing the former (consistent
  with Connor's code).
- Instead of AttrSet and attrsOf, chose to use infix notation
  (`String ⇒ T) with the bold arrow hinting at a "memoized" map (cf.
  dex-lang).

Performance considerations:
- Adding FOD info to the schema originally contributed extra 1.5s to
  evaluation time because of `imports` (`mkMerge`) abuse: `json.nix`
  generates many small definitions, most of which are redundant are are
  retained in `options.<path>.definitions`.
- In the worst case, at the time of writing, unoptimized `json.nix`
  results in 4s evaluation time when used with the full set of
  manifests, including the new backported manifests which doubles the
  number of FODs and versions compared to Nixpkgs.
- For this reason we do not directly use `json.nix` in Nixpkgs, but we
  export and check in the output of `evalModules` (cf. _rawManifests in
  the next PRs), bringing down the evaluation time to hundreds of
  miliseconds. It's obviously desirable to optimize this further, but at
  least this doesn't make things worse.

On checking-in vendored content.
The output of evalModules is only smaller (measured in lines of code)
than the total size of upstream manifests. Most of the reduction comes
from deduplicating the "feature manifests".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 8.has: documentation This PR adds or changes documentation 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

3 participants