fix: support single-file model weights (model.safetensors / pytorch_model.bin) by RajeshKumar11 · Pull Request #272 · lyogavin/airllm

RajeshKumar11 · 2026-03-19T10:12:10Z

Problem

AirLLM currently fails with any model that ships its weights as a single file rather than a sharded set with an index JSON. This is common for models up to ~7B parameters (TinyLlama, Phi-2, Gemma-2B, Qwen-1.8B, etc.).

AssertionError: model.safetensors.index.json should exist.

The splitter only handled two formats:

pytorch_model.bin.index.json + shards
model.safetensors.index.json + shards

Fix (`utils.py`)

Added two additional cases in split_and_save_layers():

Weight file	Behaviour
`model.safetensors.index.json`	existing — unchanged
`model.safetensors` (no index)	NEW — reads tensor keys via `safe_open` header (no data loaded) and builds weight map in-memory
`pytorch_model.bin` (no index)	NEW — loads state dict to extract key list
none of the above	raises `FileNotFoundError` with a clear message (replaces cryptic `AssertionError`)

Fix (`airllm_base.py`)

Also includes a prerequisite fix: from optimum.bettertransformer import BetterTransformer was a bare top-level import that raises ImportError on optimum >= 2.0, which removed the bettertransformer sub-package. Wrapped in try/except with a bettertransformer_available flag.

Tests

New tests/test_single_file_model.py with 4 cases (all pass, no GPU required):

single model.safetensors splits correctly into per-layer shards
single pytorch_model.bin splits correctly
existing sharded index path still works (regression guard)
missing weights raises FileNotFoundError

Verification

Confirmed end-to-end: TinyLlama/TinyLlama-1.1B-Chat-v1.0 (single-file safetensors, no index) now loads and splits without any manual workaround.

…ard index Models <= ~7B (e.g. TinyLlama, Phi, Gemma-2B) are distributed as a single model.safetensors or pytorch_model.bin file with no shard-index JSON. AirLLM previously hard-asserted that model.safetensors.index.json must exist, making these models fail on first use. Changes in split_and_save_layers() (utils.py): - model.safetensors.index.json → handled (existing behaviour, now elif) - model.safetensors (no index) → NEW: reads tensor keys via safe_open header (no data loaded) and builds weight_map in-memory - pytorch_model.bin (no index) → NEW: loads state dict to extract key list - none of the above → raises FileNotFoundError with a clear message Also adds tests/test_single_file_model.py with 4 cases: - single model.safetensors splits correctly - single pytorch_model.bin splits correctly - sharded index path still works (regression guard) - missing weights raises FileNotFoundError

optimum 2.x dropped the bettertransformer sub-package. The bare `from optimum.bettertransformer import BetterTransformer` at module level caused an ImportError on every import of airllm, making the library completely unusable with current optimum. Wrap the import in try/except and gate the transform() call behind the resulting `bettertransformer_available` flag so the rest of the SDPA/fallback logic continues to work unchanged. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

RajeshKumar11 and others added 2 commits March 19, 2026 15:38

RajeshKumar11 mentioned this pull request Mar 19, 2026

feat: add Windows integrated GPU support via torch-directml (Intel/AMD iGPU) #271

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272

fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272
RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
RajeshKumar11:fix/single-file-model-support

RajeshKumar11 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

RajeshKumar11 commented Mar 19, 2026

Problem

Fix (utils.py)

Fix (airllm_base.py)

Tests

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix (`utils.py`)

Fix (`airllm_base.py`)