Skip to content

fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272

Open
RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
RajeshKumar11:fix/single-file-model-support
Open

fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272
RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
RajeshKumar11:fix/single-file-model-support

Conversation

@RajeshKumar11
Copy link
Copy Markdown

Problem

AirLLM currently fails with any model that ships its weights as a single file rather than a sharded set with an index JSON. This is common for models up to ~7B parameters (TinyLlama, Phi-2, Gemma-2B, Qwen-1.8B, etc.).

AssertionError: model.safetensors.index.json should exist.

The splitter only handled two formats:

  • pytorch_model.bin.index.json + shards
  • model.safetensors.index.json + shards

Fix (utils.py)

Added two additional cases in split_and_save_layers():

Weight file Behaviour
model.safetensors.index.json existing — unchanged
model.safetensors (no index) NEW — reads tensor keys via safe_open header (no data loaded) and builds weight map in-memory
pytorch_model.bin (no index) NEW — loads state dict to extract key list
none of the above raises FileNotFoundError with a clear message (replaces cryptic AssertionError)

Fix (airllm_base.py)

Also includes a prerequisite fix: from optimum.bettertransformer import BetterTransformer was a bare top-level import that raises ImportError on optimum >= 2.0, which removed the bettertransformer sub-package. Wrapped in try/except with a bettertransformer_available flag.

Tests

New tests/test_single_file_model.py with 4 cases (all pass, no GPU required):

  • single model.safetensors splits correctly into per-layer shards
  • single pytorch_model.bin splits correctly
  • existing sharded index path still works (regression guard)
  • missing weights raises FileNotFoundError

Verification

Confirmed end-to-end: TinyLlama/TinyLlama-1.1B-Chat-v1.0 (single-file safetensors, no index) now loads and splits without any manual workaround.

RajeshKumar11 and others added 2 commits March 19, 2026 15:38
…ard index

Models <= ~7B (e.g. TinyLlama, Phi, Gemma-2B) are distributed as a single
model.safetensors or pytorch_model.bin file with no shard-index JSON.
AirLLM previously hard-asserted that model.safetensors.index.json must exist,
making these models fail on first use.

Changes in split_and_save_layers() (utils.py):
- model.safetensors.index.json  → handled (existing behaviour, now elif)
- model.safetensors (no index)  → NEW: reads tensor keys via safe_open header
  (no data loaded) and builds weight_map in-memory
- pytorch_model.bin (no index)  → NEW: loads state dict to extract key list
- none of the above             → raises FileNotFoundError with a clear message

Also adds tests/test_single_file_model.py with 4 cases:
  - single model.safetensors splits correctly
  - single pytorch_model.bin splits correctly
  - sharded index path still works (regression guard)
  - missing weights raises FileNotFoundError
optimum 2.x dropped the bettertransformer sub-package.  The bare
`from optimum.bettertransformer import BetterTransformer` at module
level caused an ImportError on every import of airllm, making the
library completely unusable with current optimum.

Wrap the import in try/except and gate the transform() call behind
the resulting `bettertransformer_available` flag so the rest of the
SDPA/fallback logic continues to work unchanged.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant