(#2615) use NV_GPU for visible device list by bghira · Pull Request #2718 · bghira/SimpleTuner

bghira · 2026-05-19T20:49:12Z

This pull request improves how device selection and process launching are handled in the training workflow, and adds comprehensive tests to ensure correct behavior in various scenarios. The main changes include normalizing device selection logic, ensuring provider GPU assignments are respected, preventing duplicate accelerator flags, and enhancing test coverage for these cases.

Device selection and environment handling:

Added a _normalize_visible_device_list helper to robustly parse and validate device lists from environment variables, ensuring consistent device selection logic.
Updated the process launch logic to use provider GPU assignments (from NV_GPU or NVIDIA_VISIBLE_DEVICES) as a fallback when CUDA_VISIBLE_DEVICES is unset, after normalization.

Accelerate launch command improvements:

Added logic to automatically append --multi_gpu to the accelerate launch command when multiple processes are requested, unless a mutually exclusive accelerator selector (like --use_fsdp) is already present in extra args, preventing duplicate or conflicting flags.
Refactored the code for parsing extra accelerator arguments and selector detection, improving maintainability and correctness. [1] [2]

Test enhancements:

Refactored and expanded tests in tests/test_trainer.py to cover single and multi-GPU selection, provider GPU assignment fallback, and prevention of duplicate accelerator flags, ensuring the new logic is robust and correct. [1] [2] [3]

(#2615) use NV_GPU for visible device list

ea9f781

bghira requested a review from Copilot May 19, 2026 20:49

Copilot started reviewing on behalf of bghira May 19, 2026 20:49 View session

This comment was marked as low quality.

Sign in to view

bghira merged commit 01b0a8e into main May 19, 2026
3 checks passed

bghira deleted the bugfix/klein2-early-oom branch May 19, 2026 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(#2615) use NV_GPU for visible device list#2718

(#2615) use NV_GPU for visible device list#2718
bghira merged 1 commit into
mainfrom
bugfix/klein2-early-oom

bghira commented May 19, 2026

Uh oh!

This comment was marked as low quality.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bghira commented May 19, 2026

Uh oh!

This comment was marked as low quality.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants