cudaPackages: introduce and use cudaLib#406531
Conversation
34165ab to
6a17ffd
Compare
|
|
6a17ffd to
bc25080
Compare
bc25080 to
856b135
Compare
|
Changes:
|
856b135 to
fac2fef
Compare
There was a problem hiding this comment.
Here - instead of exposing cudaLib I see something like makeCudaScope = let cudaLib = ...; in cudaLib.${...}; cudaPackages_XX_YY = makeCudaScope { ... }
There was a problem hiding this comment.
I would like to have cudaLib exposed outside of the CUDA package set since it does not depend on the CUDA version or architecture. Having it at the top level is also important because if out-of-tree consumers modify it (e.g., by adding new GPUs), those changes are propagated to all consumers of cudaLib. One such example where that's important is the pkgsCuda attribute set I'd like to introduce in #406568: eb044ff.
There was a problem hiding this comment.
Following up on the discussions we've had asynchronously:
-
Out-of-tree support as such requires two public pieces of information:
- (extendible) static data, and
- the package set constructor (
Data -> Configuration -> PackageSet,
where
Configurationis version/tag constraints but may include extra extensions, extra fixups, etc.). Implicit here are the "fixups" and "shims", hidden inside the "constructor" (keep this way?). This PR gathers the data we've used so far intodata.nix. tests.cuda.db: init #406740 is to unifydata.nixwith the machine-readable part of the "knowledge". We're yet to sketch the "constructor". -
DX-wise we do want to establish a few fixpoints to facilitate propagation of overrides to the multiple ps instances used by Nixpkgs: a fixpoint for extensions, for fixups, for "static data". We do want to keep them to the minimum though, so as not to commit to supporting a wider interface. Connor you mentioned you had a PR for
cudaPackagesExtensionsopen? -
In the current form it's a bit backward that
cudaLibis defined outsidecudaPackagesbut is only used inside. In the formalism established above, the motivation is obscured by the fact we haven't isolated the "constructor" yet. -
I'd also potentially abstain from exposing anything that doesn't need to be a fixpoint
84136e0 to
24a204e
Compare
Move _cuda/db to its final location, aligned with the _cuda.lib PR (NixOS#406531), introduce the top-level _cuda fix-point
SomeoneSerge
left a comment
There was a problem hiding this comment.
@ConnorBaker I didn't want to overwrite your commits, leaving the autosquash to you. IMO this is ready to be merged to master, but let's have another look-over and maybe a nixpkgs-review before backporting
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
f94c476 to
6b16bcc
Compare
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
6b16bcc to
688e14d
Compare
|
Rebased, squashed, force-pushed, and updated the PR description. Also built PyTorch to confirm it works on x86_64-linux. @SomeoneSerge ready for merge! |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-24.11
git worktree add -d .worktree/backport-406531-to-release-24.11 origin/release-24.11
cd .worktree/backport-406531-to-release-24.11
git switch --create backport-406531-to-release-24.11
git cherry-pick -x 646bebe3be8004e578842745136b8196f4f1fced a018d736978adf55e1b8f7bf79e736cd6d042573 0ac3a73b6a997ea5e05067577a17c613ecddcd7d 629ae4e42c4764f1e56cd64746a3f19e28ff62a0 c5dad2886a5623fc5e41054ab9ed9ff8e5f7ac91 765529dfff5cb04cd8ebf9275c9bccc9473fcbb5 ead65813623f92f8630811b1b3616877a727b1d9 8fcff2390e3224e970291975cedcbd23f743c6da 688e14d21a38135270544bfdfcc793d25dea2802 |
|
Successfully created backport PR for |
| ) passthruExtra.supportedCudaCapabilities; | ||
|
|
||
| # The resolved requested or default CUDA capabilities. | ||
| cudaCapabilities = |
There was a problem hiding this comment.
Ah, didn't realize that this made flags depend on the package set
Adds `tests.cuda.db`, `tests.cuda.db.html`, and `_cuda.db`. The commit obtained certain mass moves after rebasing on NixOS#406531. The original motivation behind using evalModules in cudaPackages was to ensure that the Data used to generate the package set can be evaluated (& inspected) in its entirety. This focus was lost in implementation Various stylistic and design choices: - Use of all-singular names. - Predominantly column-oriented layout (structure of arrays rather than array of structures); exceptions have to do with rebasings and compatibility. - Column validation: tried domains assertions vs. submodule with dynamic option sets; chose the former to get the more concise <name> placeholders in the nixosOptionsDoc output. - Between Maybe T and Option<T> ended up choosing the former (consistent with Connor's code). - Instead of AttrSet and attrsOf, chose to use infix notation (`String ⇒ T) with the bold arrow hinting at a "memoized" map (cf. dex-lang). Performance considerations: - Adding FOD info to the schema originally contributed extra 1.5s to evaluation time because of `imports` (`mkMerge`) abuse: `json.nix` generates many small definitions, most of which are redundant are are retained in `options.<path>.definitions`. - In the worst case, at the time of writing, unoptimized `json.nix` results in 4s evaluation time when used with the full set of manifests, including the new backported manifests which doubles the number of FODs and versions compared to Nixpkgs. - For this reason we do not directly use `json.nix` in Nixpkgs, but we export and check in the output of `evalModules` (cf. _rawManifests in the next PRs), bringing down the evaluation time to hundreds of miliseconds. It's obviously desirable to optimize this further, but at least this doesn't make things worse. On checking-in vendored content. The output of evalModules is only smaller (measured in lines of code) than the total size of upstream manifests. Most of the reduction comes from deduplicating the "feature manifests".
Adds `tests.cuda.db`, `tests.cuda.db.html`, and `_cuda.db`. The commit obtained certain mass moves after rebasing on NixOS#406531. The original motivation behind using evalModules in cudaPackages was to ensure that the Data used to generate the package set can be evaluated (& inspected) in its entirety. This focus was lost in implementation Various stylistic and design choices: - Use of all-singular names. - Predominantly column-oriented layout (structure of arrays rather than array of structures); exceptions have to do with rebasings and compatibility. - Column validation: tried domains assertions vs. submodule with dynamic option sets; chose the former to get the more concise <name> placeholders in the nixosOptionsDoc output. - Between Maybe T and Option<T> ended up choosing the former (consistent with Connor's code). - Instead of AttrSet and attrsOf, chose to use infix notation (`String ⇒ T) with the bold arrow hinting at a "memoized" map (cf. dex-lang). Performance considerations: - Adding FOD info to the schema originally contributed extra 1.5s to evaluation time because of `imports` (`mkMerge`) abuse: `json.nix` generates many small definitions, most of which are redundant are are retained in `options.<path>.definitions`. - In the worst case, at the time of writing, unoptimized `json.nix` results in 4s evaluation time when used with the full set of manifests, including the new backported manifests which doubles the number of FODs and versions compared to Nixpkgs. - For this reason we do not directly use `json.nix` in Nixpkgs, but we export and check in the output of `evalModules` (cf. _rawManifests in the next PRs), bringing down the evaluation time to hundreds of miliseconds. It's obviously desirable to optimize this further, but at least this doesn't make things worse. On checking-in vendored content. The output of evalModules is only smaller (measured in lines of code) than the total size of upstream manifests. Most of the reduction comes from deduplicating the "feature manifests".
Broadly, this PR does these things:
_cuda, which consolidates CUDA package set backbone (static data, configurations, utility functions, etc. required to create a CUDA package set)callPackageand thenewScopeused to construct the package set (i.e.,_cuda.bootstrapData,_cuda.fixups)redist archhas been changed toredist systemto better mirrornix system, as both are two-tuple system configscudaPackages.backendStdenvThis PR is a step toward landing more improvements from https://github.com/ConnorBaker/cuda-packages and merging the changes in #406740.
This PR presents largely organizational changes, but does cause rebuilds due to changes in the default selection of CUDA capabilities (see the section under
_cuda.lib).The
_cudaattribute and all that it contains is introduced to enable re-use out of tree. For the time being, it is provided without any guarantee of stability._cuda.fixupsfixupsis moved to_cuda.fixups_cuda.fixupsis removed from CUDA package sets and exists only at top-level to make clear it exists outside of any CUDA package set's fixed point_cuda.bootstrapData_cuda.bootstrapData.cudaCapabilityToInfo(pkgs/development/cuda-modules/_cuda/db/bootstrap/cuda.nix)_cuda.bootstrapData.nvccCompatibilities(pkgs/development/cuda-modules/_cuda/db/bootstrap/nvcc.nix)redistNames,redistUrlPrefix)_cuda.libConstructing out-of-tree package sets for CUDA packages is difficult and typically requires vendoring large numbers of files to be able to benefit from the machinery we (the CUDA team) have built.
_cuda.libprovides an interface through which out-of-tree consumers of Nixpkgs can access the same machinery we use. (Data and configuration re-use are enabled by_cuda.bootstrapDataand_cuda.fixups.)_cuda.libalso serves to centralize and document a number of ad-hoc functionalities created in the CUDA package set. For example, many of the functions which had been lumped intoflags.nixare now available in_cuda.lib.Due to its recency, very little of
_cuda.libis provided as a stable API. Many functions are only used internally at the moment (they are prefixed with an_) and are provided without guarantees of stability or correctness. They are however, available for use given their utility.Changes to the default set of CUDA capabilities
Previously, the set of default capabilities for CUDA 12.8 were
They are now
Given the number of capabilities introduced by the last few releases, dropping older capabilities becomes a necessity to keep the amount of device code generated small enough to link against. (There is a 2GB limit.) Since recent TensorRT releases have deprecated and no dropped support for
6.0through7.0, it makes sense to drop them from the default set of capabilities.Additionally, the inclusion of architecture-specific capabilities, like
9.0a, have been removed from the default collection of targets, as they are not meant to be used with other capabilities without careful guarding in source (a feature only discussed in 12.91).cudaPackages.backendStdenvThe changes made to
backendStdenvenable evaluation-time checking of certain common mistakes. Additionally, it provides a single point of access to common attributes describing the CUDA package set, such as:nvccHostCCMatchesStdenvCC: whether the currentstdenvmatches the version which will be provided to NVCChostNixSystem: an alias tostdenv.hostPlatform.systemhostRedistSystem: the NVIDIA equivalent ofhostNixSystemcudaCapabilities: the resolved collection of CUDA capabilitiescudaPackages.flags's attributes targethasJetsonCudaCapability: whethercudaCapabilitiescontains a capability belonging to a Jetson deviceThings done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.
Footnotes
https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/#architecture-specific_features ↩