Add NVIDIA pytest validation lane#4359
Open
dfredriksenTT wants to merge 4 commits intoaknezevic/nsmith/hf-bringup2from
Open
Add NVIDIA pytest validation lane#4359dfredriksenTT wants to merge 4 commits intoaknezevic/nsmith/hf-bringup2from
dfredriksenTT wants to merge 4 commits intoaknezevic/nsmith/hf-bringup2from
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a manifest-driven NVIDIA validation lane to
tt-xlaso a selected set of PyTorch model tests can be run as CPU-vs-CUDA comparisons using the existing evaluator stack. The new path stays inside the currenttests/runnerandtests/infrastructure rather than introducing a separate harness.At a high level, the change does three things. It adds a new pytest entrypoint for NVIDIA validation, adds a CUDA tester that compares CPU golden outputs to CUDA outputs, and extends the device connector and runner layers so the existing workload abstractions can execute on CUDA without forcing full TT/XLA initialization during collection.
flowchart LR A[Manifest row\ntest_case_id] --> B[test_models_nvidia.py] B --> C[Loader discovery\nexisting tt-xla model loaders] C --> D[DynamicTorchCudaModelTester] D --> E[CPU golden run] D --> F[CUDA run] E --> G[Existing comparison evaluators] F --> G G --> H[validated pass / fail]What Changed
The new entrypoint is
tests/runner/test_models_nvidia.py. It accepts--nvidia-cohort-json, resolves eachtest_case_idagainst the branch-local loader registry, and records results through the same report-property path the rest of the repository already uses.The CUDA execution path is implemented in
tests/runner/testers/torch/dynamic_torch_cuda_model_tester.py. This is intentionally small: it reuses the existing Torch tester and comparison machinery, but swaps the TT target path for a CUDA target path.Supporting changes in
tests/inframake CUDA execution fit the existing abstractions. The connector layer now knows aboutDeviceType.CUDA, the runner layer can execute workloads on CUDA, and several imports were made lazy so collection for the NVIDIA lane does not immediately bootstrap TT/XLA-only dependencies.The manifest contract in
test_models_nvidia.pywas also tightened. Execution is keyed bytest_case_id; display metadata is optional. That matches how the lane really works and avoids treatingmodel_idas a required execution contract when it is only descriptive metadata.Validation
Local validation:
python3 -m py_compilepassed on the new and changed NVIDIA lane filespre-commit run --all-filespassed on the branch after formatting and import cleanupHost-backed validation on the AWS A10G machine:
pytest --collect-only -q tests/runner/test_models_nvidia.py --nvidia-cohort-json /tmp/results-main-nvidia-cohort.json72 tests collected in 9.16spytest -q "tests/runner/test_models_nvidia.py::test_models_torch_nvidia[bart/question_answering/pytorch-bart-large-finetuned-squadv1]" --nvidia-cohort-json /tmp/results-main-nvidia-cohort.json1 passedEarlier bounded proof on the same host also passed for:
squeezebert/pytorch-Mnlibert_tiny_finetuned_mnli/sequence_classification/pytorch-bert-tiny-finetuned-mnliCurrent Limits
This lane is real and usable, but it is not the end of the NVIDIA bringup work.
Collection still surfaces loader-discovery warnings for optional dependencies that are not installed on the validation host. That noise is real, but it does not invalidate the proof cases above. The
results_main.yamlTT-pass source cohort is also only partially runnable in the current host environment. The current truthful split is72 runnable / 28 blocked, so the blocked portion should be treated as follow-on loader or dependency adaptation work rather than silently assumed to be covered by this PR.