Commit af4384f
committed
feat: add NemotronH hybrid model support with multi-GPU VRAM calibration
NemotronH (Mamba2 SSM + MoE + Attention) requires several changes to
load and abliterate correctly on multi-GPU systems.
Architecture support (model.py):
- Add backbone.layers fallback in get_layers() for NemotronH's
model.backbone.layers structure
- Add get_layer_modules() patterns for NemotronH's unified mixer
attribute: mixer.out_proj (Mamba2), mixer.o_proj (attention),
mixer.down_proj / mixer.experts[*].down_proj /
mixer.shared_experts.down_proj (MoE)
- Scan all layers in get_abliterable_components() instead of only
layer 0, to discover the full union of component types in hybrid
architectures
- Add _get_hidden_states_via_hooks() fallback for models that don't
return hidden_states through generate() (NemotronH returns tuple
of Nones); use forward hooks on each layer with device-aware
stacking for multi-GPU compatibility
- Skip meta-device and NaN-weight modules in abliterate() to prevent
NaN corruption when layers are CPU-offloaded by Accelerate
- Add _has_mamba_layers() to detect hybrid SSM architectures
Multi-GPU VRAM calibration (model.py):
- After inference warmup on multi-GPU systems, check if any GPU has
less than 6 GiB free; if so, release the model, measure actual free
VRAM per GPU, and reload once with corrected per-GPU caps
- Overloaded GPUs get a 0.7 correction factor for Accelerate's
layer-size underestimation; other GPUs get full budget to absorb
displaced layers; gated to hybrid SSM models via _has_mamba_layers()
so regular transformers are unaffected
User experience:
- Show trust_remote_code explanation with model repo link before
prompting, replacing the bare HuggingFace error message
- Auto-install mamba-ssm when required, with clear nvcc/CUDA toolkit
guidance on build failure
- Suggest installing causal-conv1d and mamba-ssm after loading any
model with Mamba layers when fast kernels are missing
Other fixes:
- Sum VRAM across all GPUs in print_memory_usage() (utils.py)
- Show total and per-GPU VRAM in startup output (main.py)
- Fix division by zero in evaluator when base_refusals is 0
- Add mamba optional dependency group to pyproject.toml1 parent 4c80c4b commit af4384f
4 files changed
Lines changed: 2868 additions & 2522 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | | - | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
65 | 73 | | |
66 | 74 | | |
67 | 75 | | |
| |||
753 | 761 | | |
754 | 762 | | |
755 | 763 | | |
756 | | - | |
| 764 | + | |
757 | 765 | | |
758 | 766 | | |
759 | 767 | | |
| |||
802 | 810 | | |
803 | 811 | | |
804 | 812 | | |
805 | | - | |
| 813 | + | |
806 | 814 | | |
807 | 815 | | |
808 | 816 | | |
| |||
0 commit comments