[Misc] [Offload] Update typehints, add docstrings by kylesayrs · Pull Request #622 · vllm-project/compressed-tensors

kylesayrs · 2026-03-07T18:34:26Z

Purpose

Support users passing more degenerate torch device parameters such as "cuda", "cpu", and 0
Add docstrings for distributed utilities

Prerequisites

Fix cross-device CUDA offloading in DeviceCache, offload_module, and offload_model #621

Summary by CodeRabbit

Release Notes

New Features
- Added a distributed utility function for automatic CUDA device selection and process group initialization.
Refactor
- Standardized device parameter types throughout the module to accept a broader range of device-like inputs, improving API flexibility.
Documentation
- Enhanced docstrings for distributed utilities and clarified cache operation descriptions.

kylesayrs · 2026-03-08T05:03:07Z

Land after #621

DeviceCache.__init__ hardcodes `self.offload_device = self.onload_device`, making CUDA-to-CUDA weight offloading impossible. When a user calls `offload_module(module, onload_device="cuda:0", offload_device="cuda:1")`, the offload_device argument is silently ignored and weights stay on cuda:0. This breaks the sequential pipeline's weight offloading for users with multiple GPUs who want to offload weights to a second GPU instead of CPU. The bug has two parts: 1. offload_module() doesn't pass offload_device to cache.from_mapping() 2. DeviceCache.__init__() doesn't accept an offload_device parameter Fix: Accept offload_device in DeviceCache.__init__ (defaults to onload_device for backward compatibility) and forward it through offload_module → from_mapping → constructor. CPU offloading (CPUCache) is unaffected — OffloadCache.__init__ now accepts **kwargs so the extra parameter is harmlessly ignored by subclasses that don't need it. Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

… update docstrings - Revert dispatch.py changes (offload_model behavior is intentional) - Add offload_device param to DiskCache.__init__ and CPUCache.__init__ with assert validation against fixed offload_device - Add offload_device param to OffloadCache.from_mapping with docstring - Update base.py docstring example to include offload_device - Use maintainer-suggested type hints (Optional[DeviceLikeType | Literal]) Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

…celerate - Move offload_device validation assert from CPUCache/DiskCache to OffloadCache.__init__ to reduce duplication (only triggers for subclasses with class-level offload_device attribute) - Fix DiskCache("cpu", save_folder) call in from_accelerate.py to use keyword arg offload_dir= (offload_device is now the second positional) Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

mergify · 2026-03-11T06:22:07Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

brian-dellabetta

One nit and one concern on default offload_dir, possibly covered in another PR

brian-dellabetta · 2026-03-11T14:15:33Z


    offload_device = torch.device("cpu")

+    def __init__(


nit -- if this is the exact same as base class constructor, you can just get rid of it

brian-dellabetta · 2026-03-11T14:19:52Z

+        self,
+        onload_device: torch.device,
+        offload_device: Optional[DeviceLikeType | Literal["disk"]] = None,
+        offload_dir: Optional[str] = None,


i thought we wanted to make offload_dir required? checking the docs -- the line below to create a tempfile.mkdtemp would create a directory in /tmp, which is usually a small memory-based fs that can really muck up a system if it gets filled to capacity.

I agree! But I left those changes for the other PR
#602

mergify · 2026-03-12T18:15:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kylesayrs · 2026-04-06T20:08:24Z

@coderabbitai resolve merge conflict

coderabbitai · 2026-04-06T20:08:30Z

✅ Merge conflicts resolved successfully!

Resolved 5 conflict file(s). Commit: ee19e05ecc5c7026d5bc86a312b247bdb5fb69ca pushed to kylesayrs/fix-offload-typehints.

8 file operation(s)

src/compressed_tensors/offload/__init__.py (update)
src/compressed_tensors/offload/cache/base.py (update)
src/compressed_tensors/offload/cache/cpu.py (update)
src/compressed_tensors/offload/cache/device.py (update)
src/compressed_tensors/offload/cache/disk.py (update)
src/compressed_tensors/offload/convert/from_accelerate.py (update)
src/compressed_tensors/offload/dispatch.py (update)
src/compressed_tensors/offload/utils.py (update)

View agent analysis

I'll resolve the merge conflicts by first examining the current state, then analyzing each file.

Let me verify the final state of the resolution:

All 5 conflicts are fully resolved. Here's a summary:

## Resolution Summary

All 5 files were resolved by intelligently merging both sides:

### `cache/base.py`
- **PR side**: Added `DeviceLikeType` import, updated type annotations for `onload_device`, `offload_device`, and `cls_from_device`
- **Main side**: Replaced hardcoded `"cuda"` match case with `is_accelerator_type()` guard, fixed bug using `device_type` in error message
- **Resolution**: Kept both — `DeviceLikeType` annotations AND `is_accelerator_type()` accelerator abstraction

### `cache/disk.py`
- **PR side**: Unconditional `DeviceLikeType` import, docstring updates, added `assert offloaded.device.type == "meta"`
- **Main side**: Replaced CUDA-specific APIs with `torch.accelerator.*`, replaced `is_rank0()` with `is_source_process()`, fixed symlink path resolution
- **Resolution**: Kept both — PR's type/docstring improvements AND main's hardware-agnostic APIs

### `dispatch.py`
- **PR side**: Import `DeviceLikeType`, update `default_device` type annotation
- **Main side**: Replace `torch.cuda.*` with `torch.accelerator.*`, add `show_progress` parameter
- **Resolution**: Kept both — `DeviceLikeType` annotation AND accelerator abstraction + `show_progress`

### `dist_utils.py`
- **PR side**: Added docstrings to `is_rank0`/`is_distributed`, added `init_dist` function with docstring
- **Main side**: Refactored to re-export from `compressed_tensors.distributed`, added `is_source_process()`, deprecated `is_rank0()`
- **Resolution**: Adopted main's re-export architecture with `init_dist` included in `__all__` re-exports

### `utils.py`
- **PR side**: `DeviceLikeType` imports/annotations, `torch.device(default)` conversion in `get_module_device`
- **Main side**: Added `to_meta()` and `as_single_threaded()` functions
- **Resolution**: Kept both — PR's type annotation updates AND main's new utility functions

Resolved conflicts in: - src/compressed_tensors/offload/cache/base.py (content) - src/compressed_tensors/offload/cache/cpu.py (content) - src/compressed_tensors/offload/cache/device.py (content) - src/compressed_tensors/offload/cache/disk.py (content) - src/compressed_tensors/offload/dispatch.py (content) Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

coderabbitai · 2026-04-06T20:13:42Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f600a05-bc22-4b76-ab26-bb1980a0479c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Refactored type annotations across the offload module to use DeviceLikeType from torch._prims_common for device parameters, replacing explicit torch.device | str unions. Added a new init_dist function for distributed training initialization. Made imports unconditional in cache modules. Updated related function signatures and docstrings.

Changes

Cohort / File(s)	Summary
Core Offload Module `src/compressed_tensors/offload/__init__.py`	Updated type annotations for `get_execution_device`, `get_offloaded_device`, `align_modules`, and `align_module_device` to use `Optional[DeviceLikeType]` for device parameters. Changed `get_execution_device` return type from `torch.device \| Literal["disk"]` to `torch.device`.
Cache Base Layer `src/compressed_tensors/offload/cache/base.py`	Updated `OffloadCache` class attributes `onload_device` and `offload_device` type annotations to use `DeviceLikeType`. Modified `cls_from_device` classmethod signature to accept `DeviceLikeType \| Literal["disk"]`.
Cache Implementations `src/compressed_tensors/offload/cache/cpu.py`, `src/compressed_tensors/offload/cache/device.py`, `src/compressed_tensors/offload/cache/disk.py`	Added unconditional imports of `DeviceLikeType` from `torch._prims_common` (removed `TYPE_CHECKING` guards). Updated `DiskCache.update_offload` docstring and added assertion for meta tensor validation. Extended `DiskCache.create_checkpoint_symlink` with detailed docstring.
Conversion Utilities `src/compressed_tensors/offload/convert/from_accelerate.py`	Updated return type annotation of `remove_accelerate_from_module` to use `DeviceLikeType` in tuple signature.
Dispatch and Utils `src/compressed_tensors/offload/dispatch.py`, `src/compressed_tensors/offload/utils.py`	Updated `get_device_map` and `move_module_tensor` function signatures to accept `DeviceLikeType`. Modified `get_module_device` to apply `torch.device()` conversion on default parameter.
Distributed Utilities `src/compressed_tensors/offload/dist_utils.py`	Added new `init_dist` function to `__all__` exports. Implemented distributed initialization: validates `torchrun` environment variables, selects CUDA device by `LOCAL_RANK`, initializes NCCL process group, and synchronizes ranks. Added docstrings to `is_rank0` and `is_distributed` functions.

Sequence Diagram

sequenceDiagram
    participant Code as Application Code
    participant Env as torchrun Env Vars
    participant CUDA as CUDA Device
    participant NCCL as NCCL Backend
    participant Barrier as Synchronization

    Code->>Env: Validate RANK, LOCAL_RANK, MASTER_ADDR, MASTER_PORT
    Env-->>Code: Environment variables confirmed
    Code->>CUDA: Select device based on LOCAL_RANK
    CUDA-->>Code: Device context established
    Code->>NCCL: Initialize process group with env:// backend
    NCCL-->>Code: Process group initialized
    Code->>Barrier: Call dist.barrier()
    Barrier-->>Code: All ranks synchronized

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With DeviceLikeType we now align,
Device parameters fit design so fine!
Type annotations polished, clean and bright,
init_dist brings distributed might—
Flexible tensors offload in the night! 🌙✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is generic and vague, using the '[Misc]' tag and 'Update typehints, add docstrings' which are broad descriptive terms that don't clearly convey the primary change to someone scanning history.	Consider a more specific title like '[Offload] Expand device type hints to support string and integer device parameters' that better captures the main objective of supporting degenerate torch device parameters.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 86.96% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kylesayrs/fix-offload-typehints

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mergify · 2026-04-06T20:16:33Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/compressed_tensors/offload/dist_utils.py (1)
33-57: ⚠️ Potential issue | 🟡 Minor

Improve defensive programming in init_dist() for the public API.

The function is exported as part of the public API but lacks safeguards. Add a double-initialization guard to prevent errors if called multiple times, and validate all required environment variables before accessing them to provide clearer error messages instead of KeyError.
Suggested fix
-def init_dist():
+def init_dist() -> None:
+    if dist.is_initialized():
+        return
+
-    if "TORCHELASTIC_RUN_ID" not in os.environ:
+    required_env = ("TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", "WORLD_SIZE")
+    missing = [name for name in required_env if name not in os.environ]
+    if missing:
         raise ValueError(
-            "Cannot find distributed environment. "
+            f"Cannot find distributed environment variables: {missing}. "
             "Please make sure you are using `torchrun --nproc-per-node ...`."
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/dist_utils.py` around lines 33 - 57, The
init_dist() function should validate required env vars and avoid double
initialization: check that torch.distributed.is_available() and not
torch.distributed.is_initialized() before calling dist.init_process_group, and
explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and "WORLD_SIZE"
exist in os.environ (raise clear ValueError messages if any are missing) before
converting them to int; also ensure you pass a plain device index (local_rank)
as device_id to dist.init_process_group and still call
torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier()
after successful init to keep behavior intact.
src/compressed_tensors/offload/__init__.py (1)
139-152: ⚠️ Potential issue | 🟡 Minor

Return type mismatch: get_execution_device returns torch.device but onload_device may be a string.

At line 149, this function returns module._parameters.onload_device directly. However, onload_device is typed as DeviceLikeType in OffloadCache (which includes strings), while this function's return type is torch.device.

This is related to the normalization issue in OffloadCache.__init__. If onload_device is normalized to torch.device in the base class constructor, this return type would be correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/__init__.py` around lines 139 - 152, The
return type mismatch happens because get_execution_device declares torch.device
but returns module._parameters.onload_device which may be a string; fix by
normalizing onload_device to a torch.device in OffloadCache.__init__ (or
otherwise ensuring OffloadCache.onload_device is always a torch.device) so
get_execution_device can safely return module._parameters.onload_device; update
OffloadCache.__init__ to convert DeviceLikeType to torch.device (using
torch.device(...) or equivalent normalization) and add/adjust typing to reflect
the normalized attribute.

🧹 Nitpick comments (3)

src/compressed_tensors/offload/cache/cpu.py (1)

4-4: Unused imports: Literal and Optional are imported but not used in this file.

These imports don't appear to be used anywhere in CPUCache. Consider removing them to keep imports clean.

♻️ Suggested fix

-from typing import Literal, Optional
+pass  # No typing imports needed currently

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/cache/cpu.py` at line 4, The import line
includes unused typing names Literal and Optional; remove those unused imports
from the top-level import statement (so the typing import only contains what is
actually used) in the module that defines CPUCache to clean up imports and
satisfy linters; locate the import statement near the CPUCache class and delete
Literal and Optional from it (or remove the entire typing import if nothing from
typing is used).

src/compressed_tensors/offload/dispatch.py (1)

102-104: Consider avoiding mutable/callable default argument.

The static analysis tool (Ruff B008) flags torch.device("cpu") in the default argument. While this is unlikely to cause issues for an immutable device object, it's generally better practice to use None as the default and construct the device inside the function.

♻️ Suggested fix

+_DEFAULT_DEVICE = torch.device("cpu")
+
 def get_device_map(
-    model: torch.nn.Module, default_device: DeviceLikeType = torch.device("cpu")
+    model: torch.nn.Module, default_device: DeviceLikeType | None = None
 ) -> DeviceMap:
     """
     Get the device map of a CT-offloaded model

     :param: model: model to get device map of
     :param default_device: the default onload/offload device
         when module has no parameters
     :return: device map specifying the onload and offload device of all modules
     """
     from compressed_tensors.offload import get_execution_device, get_offloaded_device

+    if default_device is None:
+        default_device = _DEFAULT_DEVICE
+
     return {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/dispatch.py` around lines 102 - 104, The
default argument for get_device_map currently uses torch.device("cpu") which
triggers a mutable/callable default warning; change the signature to accept
default_device: DeviceLikeType | None = None and inside get_device_map detect
None and set default_device = torch.device("cpu") (or equivalent) before using
it, updating any references to default_device accordingly to avoid the callable
default; this targets the get_device_map function and its default_device
parameter.

src/compressed_tensors/offload/cache/device.py (1)

4-4: Unused import: TYPE_CHECKING is no longer used.

TYPE_CHECKING was previously used to guard the DeviceLikeType import, but now that import is unconditional. Consider removing TYPE_CHECKING from the import statement.

♻️ Suggested fix

-from typing import TYPE_CHECKING, Literal, Optional
+from typing import Literal, Optional

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/cache/device.py` at line 4, Remove the unused
TYPE_CHECKING import from the typing import list: the current import line
includes TYPE_CHECKING but DeviceLikeType is now imported unconditionally, so
update the import to only include the actually used symbols (e.g., Literal and
Optional) and eliminate TYPE_CHECKING; ensure any references to TYPE_CHECKING
elsewhere in device.py are absent or removed so the import cleanup is safe.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/offload/cache/base.py`:
- Around line 35-36: get_execution_device's annotated return type (torch.device)
doesn't match the actual value returned (module._parameters.onload_device which
can be a string/DeviceLikeType); to fix, normalize and store a proper
torch.device in the object constructor so get_execution_device can safely return
torch.device: during initialization (where onload_device is accepted/assigned,
e.g., the constructor that sets module._parameters.onload_device and where
dispatch.py assigns module._parameters.onload_device = onload_device), convert
the incoming onload_device (string or torch.device) to torch.device (using
torch.device(onload_device) or equivalent) and assign that normalized
torch.device back to module._parameters.onload_device so get_execution_device
can keep its torch.device return type without type mismatch.

In `@src/compressed_tensors/offload/dist_utils.py`:
- Line 10: The __all__ export list is unsorted and triggers Ruff RUF022; update
the __all__ declaration in dist_utils.py to contain the public symbols sorted
alphabetically (e.g., ["init_dist", "is_distributed", "is_rank0"]) so that the
names exported by the module match Ruff's expected sorted order and stop the
lint warnings for the __all__ variable.

---

Outside diff comments:
In `@src/compressed_tensors/offload/__init__.py`:
- Around line 139-152: The return type mismatch happens because
get_execution_device declares torch.device but returns
module._parameters.onload_device which may be a string; fix by normalizing
onload_device to a torch.device in OffloadCache.__init__ (or otherwise ensuring
OffloadCache.onload_device is always a torch.device) so get_execution_device can
safely return module._parameters.onload_device; update OffloadCache.__init__ to
convert DeviceLikeType to torch.device (using torch.device(...) or equivalent
normalization) and add/adjust typing to reflect the normalized attribute.

In `@src/compressed_tensors/offload/dist_utils.py`:
- Around line 33-57: The init_dist() function should validate required env vars
and avoid double initialization: check that torch.distributed.is_available() and
not torch.distributed.is_initialized() before calling dist.init_process_group,
and explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and
"WORLD_SIZE" exist in os.environ (raise clear ValueError messages if any are
missing) before converting them to int; also ensure you pass a plain device
index (local_rank) as device_id to dist.init_process_group and still call
torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier()
after successful init to keep behavior intact.

---

Nitpick comments:
In `@src/compressed_tensors/offload/cache/cpu.py`:
- Line 4: The import line includes unused typing names Literal and Optional;
remove those unused imports from the top-level import statement (so the typing
import only contains what is actually used) in the module that defines CPUCache
to clean up imports and satisfy linters; locate the import statement near the
CPUCache class and delete Literal and Optional from it (or remove the entire
typing import if nothing from typing is used).

In `@src/compressed_tensors/offload/cache/device.py`:
- Line 4: Remove the unused TYPE_CHECKING import from the typing import list:
the current import line includes TYPE_CHECKING but DeviceLikeType is now
imported unconditionally, so update the import to only include the actually used
symbols (e.g., Literal and Optional) and eliminate TYPE_CHECKING; ensure any
references to TYPE_CHECKING elsewhere in device.py are absent or removed so the
import cleanup is safe.

In `@src/compressed_tensors/offload/dispatch.py`:
- Around line 102-104: The default argument for get_device_map currently uses
torch.device("cpu") which triggers a mutable/callable default warning; change
the signature to accept default_device: DeviceLikeType | None = None and inside
get_device_map detect None and set default_device = torch.device("cpu") (or
equivalent) before using it, updating any references to default_device
accordingly to avoid the callable default; this targets the get_device_map
function and its default_device parameter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 905612b0-5370-4818-a280-af6669170d17

📥 Commits

Reviewing files that changed from the base of the PR and between 91db5d7 and df852eb.

📒 Files selected for processing (9)

src/compressed_tensors/offload/__init__.py
src/compressed_tensors/offload/cache/base.py
src/compressed_tensors/offload/cache/cpu.py
src/compressed_tensors/offload/cache/device.py
src/compressed_tensors/offload/cache/disk.py
src/compressed_tensors/offload/convert/from_accelerate.py
src/compressed_tensors/offload/dispatch.py
src/compressed_tensors/offload/dist_utils.py
src/compressed_tensors/offload/utils.py

coderabbitai · 2026-04-06T20:18:49Z

+    onload_device: DeviceLikeType
+    offload_device: DeviceLikeType | Literal["disk"]


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how onload_device is used after being stored - does any code assume it's a torch.device? rg -n -A2 'onload_device' src/compressed_tensors/offload/ --type py | head -60

Repository: vllm-project/compressed-tensors

Length of output: 4390

🏁 Script executed:

# Find where DeviceLikeType is defined and check its definition rg -n "DeviceLikeType" src/compressed_tensors/ --type py -B2 -A2 | head -80

Repository: vllm-project/compressed-tensors

Length of output: 5353

🏁 Script executed:

# Check the send_tensors function signature rg -n "def send_tensors" src/compressed_tensors/ --type py -A5

Repository: vllm-project/compressed-tensors

Length of output: 543

🏁 Script executed:

# Check the original type annotations in base.py before change git show HEAD:src/compressed_tensors/offload/cache/base.py | head -60

Repository: vllm-project/compressed-tensors

Length of output: 2316

🏁 Script executed:

# Search for the get_execution_device function mentioned in review comment rg -n "get_execution_device" src/compressed_tensors/ --type py -B2 -A5

Repository: vllm-project/compressed-tensors

Length of output: 7067

🏁 Script executed:

# Check what __init__.py line 149 actually contains (referenced in review) rg -n "^" src/compressed_tensors/offload/__init__.py | head -200

Repository: vllm-project/compressed-tensors

Length of output: 7278

🏁 Script executed:

# Check all usages of onload_device to see if any depend on it being torch.device rg -n "\.onload_device" src/compressed_tensors/ --type py -B1 -A1

Repository: vllm-project/compressed-tensors

Length of output: 2514

🏁 Script executed:

# Check torch.device constructor to confirm it accepts strings python3 << 'EOF' import torch # Verify that torch.device can be called with string try: d = torch.device("cuda:0") print(f"torch.device('cuda:0') works: {d}") except Exception as e: print(f"Error: {e}") EOF

Repository: vllm-project/compressed-tensors

Length of output: 192

🏁 Script executed:

# Check what AlignDevicesHook expects for execution_device rg -n "AlignDevicesHook" src/compressed_tensors/ --type py -B2 -A10

Repository: vllm-project/compressed-tensors

Length of output: 6674

Type annotation mismatch: get_execution_device() return type is incorrect

The function get_execution_device() declares a return type of torch.device (line 141 in __init__.py), but it directly returns module._parameters.onload_device which is typed as DeviceLikeType and can be a string. In dispatch.py (line 56), strings are assigned directly: module._parameters.onload_device = onload_device where the parameter accepts torch.device | str.

While the code works in practice because callers accept DeviceLikeType, the type contract is violated. Normalizing to torch.device in the constructor resolves this:

🔧 Suggested fix

def __init__( self, onload_device: torch.device | str, offload_device: torch.device | str | Literal["disk"] | None = None, ): super().__init__() - self.onload_device = onload_device + self.onload_device = torch.device(onload_device) self.offloaded_values = dict()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/cache/base.py` around lines 35 - 36, get_execution_device's annotated return type (torch.device) doesn't match the actual value returned (module._parameters.onload_device which can be a string/DeviceLikeType); to fix, normalize and store a proper torch.device in the object constructor so get_execution_device can safely return torch.device: during initialization (where onload_device is accepted/assigned, e.g., the constructor that sets module._parameters.onload_device and where dispatch.py assigns module._parameters.onload_device = onload_device), convert the incoming onload_device (string or torch.device) to torch.device (using torch.device(onload_device) or equivalent) and assign that normalized torch.device back to module._parameters.onload_device so get_execution_device can keep its torch.device return type without type mismatch.

coderabbitai · 2026-04-06T20:18:49Z



-__all__ = ["is_distributed", "is_rank0"]
+__all__ = ["is_distributed", "is_rank0", "init_dist"]


⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

Line 10 is unsorted and will keep triggering lint warnings.

Suggested fix

-__all__ = ["is_distributed", "is_rank0", "init_dist"] +__all__ = ["init_dist", "is_distributed", "is_rank0"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

__all__ = ["is_distributed", "is_rank0", "init_dist"]

__all__ = ["init_dist", "is_distributed", "is_rank0"]

🧰 Tools

🪛 Ruff (0.15.9)

[warning] 10-10: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/dist_utils.py` at line 10, The __all__ export list is unsorted and triggers Ruff RUF022; update the __all__ declaration in dist_utils.py to contain the public symbols sorted alphabetically (e.g., ["init_dist", "is_distributed", "is_rank0"]) so that the names exported by the module match Ruff's expected sorted order and stop the lint warnings for the __all__ variable.

mergify · 2026-04-08T16:14:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kylesayrs · 2026-04-20T14:15:31Z

@coderabbitai resolve merge conflict

Resolved conflicts in: - src/compressed_tensors/offload/cache/base.py (content) - src/compressed_tensors/offload/cache/disk.py (content) - src/compressed_tensors/offload/dispatch.py (content) - src/compressed_tensors/offload/dist_utils.py (content) - src/compressed_tensors/offload/utils.py (content) Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

mergify · 2026-04-20T14:22:57Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

kylesayrs marked this pull request as draft March 8, 2026 05:02

changjonathanc and others added 7 commits March 8, 2026 06:48

Update src/compressed_tensors/offload/cache/base.py

b671f70

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Update src/compressed_tensors/offload/module.py

dcc360e

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Update src/compressed_tensors/offload/cache/device.py

2b1ff77

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

Add type hint to DiskCache.offload_device parameter

d5260f4

Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>

fynnsu approved these changes Mar 9, 2026

View reviewed changes

kylesayrs force-pushed the kylesayrs/fix-offload-typehints branch from 816a7fb to 4ab957e Compare March 11, 2026 06:19

kylesayrs added 2 commits March 11, 2026 02:19

Merge branch 'main' into fix/device-cache-cross-device-offload

0b4d57b

typehints

1e533ac

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

mergify Bot added the quality-failed label Mar 11, 2026

fix style, add docstring

d39cfcb

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/fix-offload-typehints branch from 4ab957e to d39cfcb Compare March 11, 2026 06:28

kylesayrs marked this pull request as ready for review March 11, 2026 06:28

mergify Bot removed the quality-failed label Mar 11, 2026

remove assert

ff07bd6

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

brian-dellabetta approved these changes Mar 11, 2026

View reviewed changes

mergify Bot added the needs-rebase label Mar 12, 2026

mergify Bot removed the needs-rebase label Apr 6, 2026

mergify Bot added the quality-failed label Apr 6, 2026

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

mergify Bot added the needs-rebase label Apr 8, 2026

mergify Bot removed quality-failed needs-rebase labels Apr 20, 2026

mergify Bot added the quality-failed label Apr 20, 2026

		onload_device: DeviceLikeType
		offload_device: DeviceLikeType \| Literal["disk"]



		__all__ = ["is_distributed", "is_rank0"]
		__all__ = ["is_distributed", "is_rank0", "init_dist"]

	__all__ = ["is_distributed", "is_rank0", "init_dist"]
	__all__ = ["init_dist", "is_distributed", "is_rank0"]

Conversation

kylesayrs commented Mar 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Summary by CodeRabbit

Release Notes

Uh oh!

kylesayrs commented Mar 8, 2026

Uh oh!

mergify Bot commented Mar 11, 2026

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

kylesayrs Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Mar 12, 2026

Uh oh!

kylesayrs commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

mergify Bot commented Apr 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 8, 2026

Uh oh!

kylesayrs commented Apr 20, 2026

Uh oh!

mergify Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kylesayrs commented Mar 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading