fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272
Open
RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
Open
fix: support single-file model weights (model.safetensors / pytorch_model.bin)#272RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
RajeshKumar11 wants to merge 2 commits intolyogavin:mainfrom
Conversation
…ard index Models <= ~7B (e.g. TinyLlama, Phi, Gemma-2B) are distributed as a single model.safetensors or pytorch_model.bin file with no shard-index JSON. AirLLM previously hard-asserted that model.safetensors.index.json must exist, making these models fail on first use. Changes in split_and_save_layers() (utils.py): - model.safetensors.index.json → handled (existing behaviour, now elif) - model.safetensors (no index) → NEW: reads tensor keys via safe_open header (no data loaded) and builds weight_map in-memory - pytorch_model.bin (no index) → NEW: loads state dict to extract key list - none of the above → raises FileNotFoundError with a clear message Also adds tests/test_single_file_model.py with 4 cases: - single model.safetensors splits correctly - single pytorch_model.bin splits correctly - sharded index path still works (regression guard) - missing weights raises FileNotFoundError
optimum 2.x dropped the bettertransformer sub-package. The bare `from optimum.bettertransformer import BetterTransformer` at module level caused an ImportError on every import of airllm, making the library completely unusable with current optimum. Wrap the import in try/except and gate the transform() call behind the resulting `bettertransformer_available` flag so the rest of the SDPA/fallback logic continues to work unchanged. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
AirLLM currently fails with any model that ships its weights as a single file rather than a sharded set with an index JSON. This is common for models up to ~7B parameters (TinyLlama, Phi-2, Gemma-2B, Qwen-1.8B, etc.).
The splitter only handled two formats:
pytorch_model.bin.index.json+ shardsmodel.safetensors.index.json+ shardsFix (
utils.py)Added two additional cases in
split_and_save_layers():model.safetensors.index.jsonmodel.safetensors(no index)safe_openheader (no data loaded) and builds weight map in-memorypytorch_model.bin(no index)FileNotFoundErrorwith a clear message (replaces crypticAssertionError)Fix (
airllm_base.py)Also includes a prerequisite fix:
from optimum.bettertransformer import BetterTransformerwas a bare top-level import that raisesImportErroron optimum >= 2.0, which removed thebettertransformersub-package. Wrapped intry/exceptwith abettertransformer_availableflag.Tests
New
tests/test_single_file_model.pywith 4 cases (all pass, no GPU required):model.safetensorssplits correctly into per-layer shardspytorch_model.binsplits correctlyFileNotFoundErrorVerification
Confirmed end-to-end:
TinyLlama/TinyLlama-1.1B-Chat-v1.0(single-file safetensors, no index) now loads and splits without any manual workaround.