Skip to content

[Misc] [Offload] Update typehints, add docstrings#622

Open
kylesayrs wants to merge 13 commits intomainfrom
kylesayrs/fix-offload-typehints
Open

[Misc] [Offload] Update typehints, add docstrings#622
kylesayrs wants to merge 13 commits intomainfrom
kylesayrs/fix-offload-typehints

Conversation

@kylesayrs
Copy link
Copy Markdown
Collaborator

@kylesayrs kylesayrs commented Mar 7, 2026

Purpose

  • Support users passing more degenerate torch device parameters such as "cuda", "cpu", and 0
  • Add docstrings for distributed utilities

Prerequisites

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a distributed utility function for automatic CUDA device selection and process group initialization.
  • Refactor

    • Standardized device parameter types throughout the module to accept a broader range of device-like inputs, improving API flexibility.
  • Documentation

    • Enhanced docstrings for distributed utilities and clarified cache operation descriptions.

@kylesayrs kylesayrs marked this pull request as draft March 8, 2026 05:02
@kylesayrs
Copy link
Copy Markdown
Collaborator Author

Land after #621

changjonathanc and others added 7 commits March 8, 2026 06:48
DeviceCache.__init__ hardcodes `self.offload_device = self.onload_device`,
making CUDA-to-CUDA weight offloading impossible. When a user calls
`offload_module(module, onload_device="cuda:0", offload_device="cuda:1")`,
the offload_device argument is silently ignored and weights stay on cuda:0.

This breaks the sequential pipeline's weight offloading for users with
multiple GPUs who want to offload weights to a second GPU instead of CPU.

The bug has two parts:
1. offload_module() doesn't pass offload_device to cache.from_mapping()
2. DeviceCache.__init__() doesn't accept an offload_device parameter

Fix: Accept offload_device in DeviceCache.__init__ (defaults to
onload_device for backward compatibility) and forward it through
offload_module → from_mapping → constructor.

CPU offloading (CPUCache) is unaffected — OffloadCache.__init__ now
accepts **kwargs so the extra parameter is harmlessly ignored by
subclasses that don't need it.

Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
… update docstrings

- Revert dispatch.py changes (offload_model behavior is intentional)
- Add offload_device param to DiskCache.__init__ and CPUCache.__init__
  with assert validation against fixed offload_device
- Add offload_device param to OffloadCache.from_mapping with docstring
- Update base.py docstring example to include offload_device
- Use maintainer-suggested type hints (Optional[DeviceLikeType | Literal])

Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
…celerate

- Move offload_device validation assert from CPUCache/DiskCache to
  OffloadCache.__init__ to reduce duplication (only triggers for
  subclasses with class-level offload_device attribute)
- Fix DiskCache("cpu", save_folder) call in from_accelerate.py to use
  keyword arg offload_dir= (offload_device is now the second positional)

Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-offload-typehints branch from 816a7fb to 4ab957e Compare March 11, 2026 06:19
@mergify
Copy link
Copy Markdown

mergify Bot commented Mar 11, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-offload-typehints branch from 4ab957e to d39cfcb Compare March 11, 2026 06:28
@kylesayrs kylesayrs marked this pull request as ready for review March 11, 2026 06:28
@mergify mergify Bot removed the quality-failed label Mar 11, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit and one concern on default offload_dir, possibly covered in another PR


offload_device = torch.device("cpu")

def __init__(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit -- if this is the exact same as base class constructor, you can just get rid of it

self,
onload_device: torch.device,
offload_device: Optional[DeviceLikeType | Literal["disk"]] = None,
offload_dir: Optional[str] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we wanted to make offload_dir required? checking the docs -- the line below to create a tempfile.mkdtemp would create a directory in /tmp, which is usually a small memory-based fs that can really muck up a system if it gets filled to capacity.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! But I left those changes for the other PR
#602

@mergify
Copy link
Copy Markdown

mergify Bot commented Mar 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 12, 2026
@kylesayrs
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve merge conflict

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Merge conflicts resolved successfully!

Resolved 5 conflict file(s). Commit: ee19e05ecc5c7026d5bc86a312b247bdb5fb69ca pushed to kylesayrs/fix-offload-typehints.

8 file operation(s)
  • src/compressed_tensors/offload/__init__.py (update)
  • src/compressed_tensors/offload/cache/base.py (update)
  • src/compressed_tensors/offload/cache/cpu.py (update)
  • src/compressed_tensors/offload/cache/device.py (update)
  • src/compressed_tensors/offload/cache/disk.py (update)
  • src/compressed_tensors/offload/convert/from_accelerate.py (update)
  • src/compressed_tensors/offload/dispatch.py (update)
  • src/compressed_tensors/offload/utils.py (update)
View agent analysis
I'll resolve the merge conflicts by first examining the current state, then analyzing each file.

Let me verify the final state of the resolution:

All 5 conflicts are fully resolved. Here's a summary:

## Resolution Summary

All 5 files were resolved by intelligently merging both sides:

### `cache/base.py`
- **PR side**: Added `DeviceLikeType` import, updated type annotations for `onload_device`, `offload_device`, and `cls_from_device`
- **Main side**: Replaced hardcoded `"cuda"` match case with `is_accelerator_type()` guard, fixed bug using `device_type` in error message
- **Resolution**: Kept both — `DeviceLikeType` annotations AND `is_accelerator_type()` accelerator abstraction

### `cache/disk.py`
- **PR side**: Unconditional `DeviceLikeType` import, docstring updates, added `assert offloaded.device.type == "meta"`
- **Main side**: Replaced CUDA-specific APIs with `torch.accelerator.*`, replaced `is_rank0()` with `is_source_process()`, fixed symlink path resolution
- **Resolution**: Kept both — PR's type/docstring improvements AND main's hardware-agnostic APIs

### `dispatch.py`
- **PR side**: Import `DeviceLikeType`, update `default_device` type annotation
- **Main side**: Replace `torch.cuda.*` with `torch.accelerator.*`, add `show_progress` parameter
- **Resolution**: Kept both — `DeviceLikeType` annotation AND accelerator abstraction + `show_progress`

### `dist_utils.py`
- **PR side**: Added docstrings to `is_rank0`/`is_distributed`, added `init_dist` function with docstring
- **Main side**: Refactored to re-export from `compressed_tensors.distributed`, added `is_source_process()`, deprecated `is_rank0()`
- **Resolution**: Adopted main's re-export architecture with `init_dist` included in `__all__` re-exports

### `utils.py`
- **PR side**: `DeviceLikeType` imports/annotations, `torch.device(default)` conversion in `get_module_device`
- **Main side**: Added `to_meta()` and `as_single_threaded()` functions
- **Resolution**: Kept both — PR's type annotation updates AND main's new utility functions

Resolved conflicts in:
- src/compressed_tensors/offload/cache/base.py (content)
- src/compressed_tensors/offload/cache/cpu.py (content)
- src/compressed_tensors/offload/cache/device.py (content)
- src/compressed_tensors/offload/cache/disk.py (content)
- src/compressed_tensors/offload/dispatch.py (content)

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f600a05-bc22-4b76-ab26-bb1980a0479c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Refactored type annotations across the offload module to use DeviceLikeType from torch._prims_common for device parameters, replacing explicit torch.device | str unions. Added a new init_dist function for distributed training initialization. Made imports unconditional in cache modules. Updated related function signatures and docstrings.

Changes

Cohort / File(s) Summary
Core Offload Module
src/compressed_tensors/offload/__init__.py
Updated type annotations for get_execution_device, get_offloaded_device, align_modules, and align_module_device to use Optional[DeviceLikeType] for device parameters. Changed get_execution_device return type from torch.device | Literal["disk"] to torch.device.
Cache Base Layer
src/compressed_tensors/offload/cache/base.py
Updated OffloadCache class attributes onload_device and offload_device type annotations to use DeviceLikeType. Modified cls_from_device classmethod signature to accept DeviceLikeType | Literal["disk"].
Cache Implementations
src/compressed_tensors/offload/cache/cpu.py, src/compressed_tensors/offload/cache/device.py, src/compressed_tensors/offload/cache/disk.py
Added unconditional imports of DeviceLikeType from torch._prims_common (removed TYPE_CHECKING guards). Updated DiskCache.update_offload docstring and added assertion for meta tensor validation. Extended DiskCache.create_checkpoint_symlink with detailed docstring.
Conversion Utilities
src/compressed_tensors/offload/convert/from_accelerate.py
Updated return type annotation of remove_accelerate_from_module to use DeviceLikeType in tuple signature.
Dispatch and Utils
src/compressed_tensors/offload/dispatch.py, src/compressed_tensors/offload/utils.py
Updated get_device_map and move_module_tensor function signatures to accept DeviceLikeType. Modified get_module_device to apply torch.device() conversion on default parameter.
Distributed Utilities
src/compressed_tensors/offload/dist_utils.py
Added new init_dist function to __all__ exports. Implemented distributed initialization: validates torchrun environment variables, selects CUDA device by LOCAL_RANK, initializes NCCL process group, and synchronizes ranks. Added docstrings to is_rank0 and is_distributed functions.

Sequence Diagram

sequenceDiagram
    participant Code as Application Code
    participant Env as torchrun Env Vars
    participant CUDA as CUDA Device
    participant NCCL as NCCL Backend
    participant Barrier as Synchronization

    Code->>Env: Validate RANK, LOCAL_RANK, MASTER_ADDR, MASTER_PORT
    Env-->>Code: Environment variables confirmed
    Code->>CUDA: Select device based on LOCAL_RANK
    CUDA-->>Code: Device context established
    Code->>NCCL: Initialize process group with env:// backend
    NCCL-->>Code: Process group initialized
    Code->>Barrier: Call dist.barrier()
    Barrier-->>Code: All ranks synchronized
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With DeviceLikeType we now align,
Device parameters fit design so fine!
Type annotations polished, clean and bright,
init_dist brings distributed might—
Flexible tensors offload in the night! 🌙✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is generic and vague, using the '[Misc]' tag and 'Update typehints, add docstrings' which are broad descriptive terms that don't clearly convey the primary change to someone scanning history. Consider a more specific title like '[Offload] Expand device type hints to support string and integer device parameters' that better captures the main objective of supporting degenerate torch device parameters.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 86.96% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kylesayrs/fix-offload-typehints

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot removed the needs-rebase label Apr 6, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented Apr 6, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/compressed_tensors/offload/dist_utils.py (1)

33-57: ⚠️ Potential issue | 🟡 Minor

Improve defensive programming in init_dist() for the public API.

The function is exported as part of the public API but lacks safeguards. Add a double-initialization guard to prevent errors if called multiple times, and validate all required environment variables before accessing them to provide clearer error messages instead of KeyError.

Suggested fix
-def init_dist():
+def init_dist() -> None:
+    if dist.is_initialized():
+        return
+
-    if "TORCHELASTIC_RUN_ID" not in os.environ:
+    required_env = ("TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", "WORLD_SIZE")
+    missing = [name for name in required_env if name not in os.environ]
+    if missing:
         raise ValueError(
-            "Cannot find distributed environment. "
+            f"Cannot find distributed environment variables: {missing}. "
             "Please make sure you are using `torchrun --nproc-per-node ...`."
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/dist_utils.py` around lines 33 - 57, The
init_dist() function should validate required env vars and avoid double
initialization: check that torch.distributed.is_available() and not
torch.distributed.is_initialized() before calling dist.init_process_group, and
explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and "WORLD_SIZE"
exist in os.environ (raise clear ValueError messages if any are missing) before
converting them to int; also ensure you pass a plain device index (local_rank)
as device_id to dist.init_process_group and still call
torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier()
after successful init to keep behavior intact.
src/compressed_tensors/offload/__init__.py (1)

139-152: ⚠️ Potential issue | 🟡 Minor

Return type mismatch: get_execution_device returns torch.device but onload_device may be a string.

At line 149, this function returns module._parameters.onload_device directly. However, onload_device is typed as DeviceLikeType in OffloadCache (which includes strings), while this function's return type is torch.device.

This is related to the normalization issue in OffloadCache.__init__. If onload_device is normalized to torch.device in the base class constructor, this return type would be correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/__init__.py` around lines 139 - 152, The
return type mismatch happens because get_execution_device declares torch.device
but returns module._parameters.onload_device which may be a string; fix by
normalizing onload_device to a torch.device in OffloadCache.__init__ (or
otherwise ensuring OffloadCache.onload_device is always a torch.device) so
get_execution_device can safely return module._parameters.onload_device; update
OffloadCache.__init__ to convert DeviceLikeType to torch.device (using
torch.device(...) or equivalent normalization) and add/adjust typing to reflect
the normalized attribute.
🧹 Nitpick comments (3)
src/compressed_tensors/offload/cache/cpu.py (1)

4-4: Unused imports: Literal and Optional are imported but not used in this file.

These imports don't appear to be used anywhere in CPUCache. Consider removing them to keep imports clean.

♻️ Suggested fix
-from typing import Literal, Optional
+pass  # No typing imports needed currently
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/cache/cpu.py` at line 4, The import line
includes unused typing names Literal and Optional; remove those unused imports
from the top-level import statement (so the typing import only contains what is
actually used) in the module that defines CPUCache to clean up imports and
satisfy linters; locate the import statement near the CPUCache class and delete
Literal and Optional from it (or remove the entire typing import if nothing from
typing is used).
src/compressed_tensors/offload/dispatch.py (1)

102-104: Consider avoiding mutable/callable default argument.

The static analysis tool (Ruff B008) flags torch.device("cpu") in the default argument. While this is unlikely to cause issues for an immutable device object, it's generally better practice to use None as the default and construct the device inside the function.

♻️ Suggested fix
+_DEFAULT_DEVICE = torch.device("cpu")
+
 def get_device_map(
-    model: torch.nn.Module, default_device: DeviceLikeType = torch.device("cpu")
+    model: torch.nn.Module, default_device: DeviceLikeType | None = None
 ) -> DeviceMap:
     """
     Get the device map of a CT-offloaded model

     :param: model: model to get device map of
     :param default_device: the default onload/offload device
         when module has no parameters
     :return: device map specifying the onload and offload device of all modules
     """
     from compressed_tensors.offload import get_execution_device, get_offloaded_device

+    if default_device is None:
+        default_device = _DEFAULT_DEVICE
+
     return {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/dispatch.py` around lines 102 - 104, The
default argument for get_device_map currently uses torch.device("cpu") which
triggers a mutable/callable default warning; change the signature to accept
default_device: DeviceLikeType | None = None and inside get_device_map detect
None and set default_device = torch.device("cpu") (or equivalent) before using
it, updating any references to default_device accordingly to avoid the callable
default; this targets the get_device_map function and its default_device
parameter.
src/compressed_tensors/offload/cache/device.py (1)

4-4: Unused import: TYPE_CHECKING is no longer used.

TYPE_CHECKING was previously used to guard the DeviceLikeType import, but now that import is unconditional. Consider removing TYPE_CHECKING from the import statement.

♻️ Suggested fix
-from typing import TYPE_CHECKING, Literal, Optional
+from typing import Literal, Optional
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/cache/device.py` at line 4, Remove the unused
TYPE_CHECKING import from the typing import list: the current import line
includes TYPE_CHECKING but DeviceLikeType is now imported unconditionally, so
update the import to only include the actually used symbols (e.g., Literal and
Optional) and eliminate TYPE_CHECKING; ensure any references to TYPE_CHECKING
elsewhere in device.py are absent or removed so the import cleanup is safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/offload/cache/base.py`:
- Around line 35-36: get_execution_device's annotated return type (torch.device)
doesn't match the actual value returned (module._parameters.onload_device which
can be a string/DeviceLikeType); to fix, normalize and store a proper
torch.device in the object constructor so get_execution_device can safely return
torch.device: during initialization (where onload_device is accepted/assigned,
e.g., the constructor that sets module._parameters.onload_device and where
dispatch.py assigns module._parameters.onload_device = onload_device), convert
the incoming onload_device (string or torch.device) to torch.device (using
torch.device(onload_device) or equivalent) and assign that normalized
torch.device back to module._parameters.onload_device so get_execution_device
can keep its torch.device return type without type mismatch.

In `@src/compressed_tensors/offload/dist_utils.py`:
- Line 10: The __all__ export list is unsorted and triggers Ruff RUF022; update
the __all__ declaration in dist_utils.py to contain the public symbols sorted
alphabetically (e.g., ["init_dist", "is_distributed", "is_rank0"]) so that the
names exported by the module match Ruff's expected sorted order and stop the
lint warnings for the __all__ variable.

---

Outside diff comments:
In `@src/compressed_tensors/offload/__init__.py`:
- Around line 139-152: The return type mismatch happens because
get_execution_device declares torch.device but returns
module._parameters.onload_device which may be a string; fix by normalizing
onload_device to a torch.device in OffloadCache.__init__ (or otherwise ensuring
OffloadCache.onload_device is always a torch.device) so get_execution_device can
safely return module._parameters.onload_device; update OffloadCache.__init__ to
convert DeviceLikeType to torch.device (using torch.device(...) or equivalent
normalization) and add/adjust typing to reflect the normalized attribute.

In `@src/compressed_tensors/offload/dist_utils.py`:
- Around line 33-57: The init_dist() function should validate required env vars
and avoid double initialization: check that torch.distributed.is_available() and
not torch.distributed.is_initialized() before calling dist.init_process_group,
and explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and
"WORLD_SIZE" exist in os.environ (raise clear ValueError messages if any are
missing) before converting them to int; also ensure you pass a plain device
index (local_rank) as device_id to dist.init_process_group and still call
torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier()
after successful init to keep behavior intact.

---

Nitpick comments:
In `@src/compressed_tensors/offload/cache/cpu.py`:
- Line 4: The import line includes unused typing names Literal and Optional;
remove those unused imports from the top-level import statement (so the typing
import only contains what is actually used) in the module that defines CPUCache
to clean up imports and satisfy linters; locate the import statement near the
CPUCache class and delete Literal and Optional from it (or remove the entire
typing import if nothing from typing is used).

In `@src/compressed_tensors/offload/cache/device.py`:
- Line 4: Remove the unused TYPE_CHECKING import from the typing import list:
the current import line includes TYPE_CHECKING but DeviceLikeType is now
imported unconditionally, so update the import to only include the actually used
symbols (e.g., Literal and Optional) and eliminate TYPE_CHECKING; ensure any
references to TYPE_CHECKING elsewhere in device.py are absent or removed so the
import cleanup is safe.

In `@src/compressed_tensors/offload/dispatch.py`:
- Around line 102-104: The default argument for get_device_map currently uses
torch.device("cpu") which triggers a mutable/callable default warning; change
the signature to accept default_device: DeviceLikeType | None = None and inside
get_device_map detect None and set default_device = torch.device("cpu") (or
equivalent) before using it, updating any references to default_device
accordingly to avoid the callable default; this targets the get_device_map
function and its default_device parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 905612b0-5370-4818-a280-af6669170d17

📥 Commits

Reviewing files that changed from the base of the PR and between 91db5d7 and df852eb.

📒 Files selected for processing (9)
  • src/compressed_tensors/offload/__init__.py
  • src/compressed_tensors/offload/cache/base.py
  • src/compressed_tensors/offload/cache/cpu.py
  • src/compressed_tensors/offload/cache/device.py
  • src/compressed_tensors/offload/cache/disk.py
  • src/compressed_tensors/offload/convert/from_accelerate.py
  • src/compressed_tensors/offload/dispatch.py
  • src/compressed_tensors/offload/dist_utils.py
  • src/compressed_tensors/offload/utils.py

Comment on lines +35 to +36
onload_device: DeviceLikeType
offload_device: DeviceLikeType | Literal["disk"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how onload_device is used after being stored - does any code assume it's a torch.device?
rg -n -A2 'onload_device' src/compressed_tensors/offload/ --type py | head -60

Repository: vllm-project/compressed-tensors

Length of output: 4390


🏁 Script executed:

# Find where DeviceLikeType is defined and check its definition
rg -n "DeviceLikeType" src/compressed_tensors/ --type py -B2 -A2 | head -80

Repository: vllm-project/compressed-tensors

Length of output: 5353


🏁 Script executed:

# Check the send_tensors function signature
rg -n "def send_tensors" src/compressed_tensors/ --type py -A5

Repository: vllm-project/compressed-tensors

Length of output: 543


🏁 Script executed:

# Check the original type annotations in base.py before change
git show HEAD:src/compressed_tensors/offload/cache/base.py | head -60

Repository: vllm-project/compressed-tensors

Length of output: 2316


🏁 Script executed:

# Search for the get_execution_device function mentioned in review comment
rg -n "get_execution_device" src/compressed_tensors/ --type py -B2 -A5

Repository: vllm-project/compressed-tensors

Length of output: 7067


🏁 Script executed:

# Check what __init__.py line 149 actually contains (referenced in review)
rg -n "^" src/compressed_tensors/offload/__init__.py | head -200

Repository: vllm-project/compressed-tensors

Length of output: 7278


🏁 Script executed:

# Check all usages of onload_device to see if any depend on it being torch.device
rg -n "\.onload_device" src/compressed_tensors/ --type py -B1 -A1

Repository: vllm-project/compressed-tensors

Length of output: 2514


🏁 Script executed:

# Check torch.device constructor to confirm it accepts strings
python3 << 'EOF'
import torch
# Verify that torch.device can be called with string
try:
    d = torch.device("cuda:0")
    print(f"torch.device('cuda:0') works: {d}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: vllm-project/compressed-tensors

Length of output: 192


🏁 Script executed:

# Check what AlignDevicesHook expects for execution_device
rg -n "AlignDevicesHook" src/compressed_tensors/ --type py -B2 -A10

Repository: vllm-project/compressed-tensors

Length of output: 6674


Type annotation mismatch: get_execution_device() return type is incorrect

The function get_execution_device() declares a return type of torch.device (line 141 in __init__.py), but it directly returns module._parameters.onload_device which is typed as DeviceLikeType and can be a string. In dispatch.py (line 56), strings are assigned directly: module._parameters.onload_device = onload_device where the parameter accepts torch.device | str.

While the code works in practice because callers accept DeviceLikeType, the type contract is violated. Normalizing to torch.device in the constructor resolves this:

🔧 Suggested fix
 def __init__(
     self,
     onload_device: torch.device | str,
     offload_device: torch.device | str | Literal["disk"] | None = None,
 ):
     super().__init__()
-    self.onload_device = onload_device
+    self.onload_device = torch.device(onload_device)
     self.offloaded_values = dict()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/cache/base.py` around lines 35 - 36,
get_execution_device's annotated return type (torch.device) doesn't match the
actual value returned (module._parameters.onload_device which can be a
string/DeviceLikeType); to fix, normalize and store a proper torch.device in the
object constructor so get_execution_device can safely return torch.device:
during initialization (where onload_device is accepted/assigned, e.g., the
constructor that sets module._parameters.onload_device and where dispatch.py
assigns module._parameters.onload_device = onload_device), convert the incoming
onload_device (string or torch.device) to torch.device (using
torch.device(onload_device) or equivalent) and assign that normalized
torch.device back to module._parameters.onload_device so get_execution_device
can keep its torch.device return type without type mismatch.



__all__ = ["is_distributed", "is_rank0"]
__all__ = ["is_distributed", "is_rank0", "init_dist"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

Line 10 is unsorted and will keep triggering lint warnings.

Suggested fix
-__all__ = ["is_distributed", "is_rank0", "init_dist"]
+__all__ = ["init_dist", "is_distributed", "is_rank0"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__all__ = ["is_distributed", "is_rank0", "init_dist"]
__all__ = ["init_dist", "is_distributed", "is_rank0"]
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 10-10: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/offload/dist_utils.py` at line 10, The __all__ export
list is unsorted and triggers Ruff RUF022; update the __all__ declaration in
dist_utils.py to contain the public symbols sorted alphabetically (e.g.,
["init_dist", "is_distributed", "is_rank0"]) so that the names exported by the
module match Ruff's expected sorted order and stop the lint warnings for the
__all__ variable.

@mergify
Copy link
Copy Markdown

mergify Bot commented Apr 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 8, 2026
@kylesayrs
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resolve merge conflict

Resolved conflicts in:
- src/compressed_tensors/offload/cache/base.py (content)
- src/compressed_tensors/offload/cache/disk.py (content)
- src/compressed_tensors/offload/dispatch.py (content)
- src/compressed_tensors/offload/dist_utils.py (content)
- src/compressed_tensors/offload/utils.py (content)

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@mergify
Copy link
Copy Markdown

mergify Bot commented Apr 20, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants