[Misc] [Offload] Update typehints, add docstrings#622
[Misc] [Offload] Update typehints, add docstrings#622
Conversation
|
Land after #621 |
DeviceCache.__init__ hardcodes `self.offload_device = self.onload_device`, making CUDA-to-CUDA weight offloading impossible. When a user calls `offload_module(module, onload_device="cuda:0", offload_device="cuda:1")`, the offload_device argument is silently ignored and weights stay on cuda:0. This breaks the sequential pipeline's weight offloading for users with multiple GPUs who want to offload weights to a second GPU instead of CPU. The bug has two parts: 1. offload_module() doesn't pass offload_device to cache.from_mapping() 2. DeviceCache.__init__() doesn't accept an offload_device parameter Fix: Accept offload_device in DeviceCache.__init__ (defaults to onload_device for backward compatibility) and forward it through offload_module → from_mapping → constructor. CPU offloading (CPUCache) is unaffected — OffloadCache.__init__ now accepts **kwargs so the extra parameter is harmlessly ignored by subclasses that don't need it. Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
… update docstrings - Revert dispatch.py changes (offload_model behavior is intentional) - Add offload_device param to DiskCache.__init__ and CPUCache.__init__ with assert validation against fixed offload_device - Add offload_device param to OffloadCache.from_mapping with docstring - Update base.py docstring example to include offload_device - Use maintainer-suggested type hints (Optional[DeviceLikeType | Literal]) Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
…celerate
- Move offload_device validation assert from CPUCache/DiskCache to
OffloadCache.__init__ to reduce duplication (only triggers for
subclasses with class-level offload_device attribute)
- Fix DiskCache("cpu", save_folder) call in from_accelerate.py to use
keyword arg offload_dir= (offload_device is now the second positional)
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
Signed-off-by: Jonathan Chang <31893406+changjonathanc@users.noreply.github.com> Signed-off-by: Jonathan Chang <changjonathanc@users.noreply.github.com>
816a7fb to
4ab957e
Compare
|
The quality checks have failed. Please run |
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
4ab957e to
d39cfcb
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
brian-dellabetta
left a comment
There was a problem hiding this comment.
One nit and one concern on default offload_dir, possibly covered in another PR
|
|
||
| offload_device = torch.device("cpu") | ||
|
|
||
| def __init__( |
There was a problem hiding this comment.
nit -- if this is the exact same as base class constructor, you can just get rid of it
| self, | ||
| onload_device: torch.device, | ||
| offload_device: Optional[DeviceLikeType | Literal["disk"]] = None, | ||
| offload_dir: Optional[str] = None, |
There was a problem hiding this comment.
i thought we wanted to make offload_dir required? checking the docs -- the line below to create a tempfile.mkdtemp would create a directory in /tmp, which is usually a small memory-based fs that can really muck up a system if it gets filled to capacity.
There was a problem hiding this comment.
I agree! But I left those changes for the other PR
#602
|
This pull request has merge conflicts that must be resolved before it can be |
|
@coderabbitai resolve merge conflict |
|
✅ Merge conflicts resolved successfully! Resolved 5 conflict file(s). Commit: 8 file operation(s)
View agent analysis |
Resolved conflicts in: - src/compressed_tensors/offload/cache/base.py (content) - src/compressed_tensors/offload/cache/cpu.py (content) - src/compressed_tensors/offload/cache/device.py (content) - src/compressed_tensors/offload/cache/disk.py (content) - src/compressed_tensors/offload/dispatch.py (content) Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughRefactored type annotations across the offload module to use Changes
Sequence DiagramsequenceDiagram
participant Code as Application Code
participant Env as torchrun Env Vars
participant CUDA as CUDA Device
participant NCCL as NCCL Backend
participant Barrier as Synchronization
Code->>Env: Validate RANK, LOCAL_RANK, MASTER_ADDR, MASTER_PORT
Env-->>Code: Environment variables confirmed
Code->>CUDA: Select device based on LOCAL_RANK
CUDA-->>Code: Device context established
Code->>NCCL: Initialize process group with env:// backend
NCCL-->>Code: Process group initialized
Code->>Barrier: Call dist.barrier()
Barrier-->>Code: All ranks synchronized
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
The quality checks have failed. Please run |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/compressed_tensors/offload/dist_utils.py (1)
33-57:⚠️ Potential issue | 🟡 MinorImprove defensive programming in
init_dist()for the public API.The function is exported as part of the public API but lacks safeguards. Add a double-initialization guard to prevent errors if called multiple times, and validate all required environment variables before accessing them to provide clearer error messages instead of KeyError.
Suggested fix
-def init_dist(): +def init_dist() -> None: + if dist.is_initialized(): + return + - if "TORCHELASTIC_RUN_ID" not in os.environ: + required_env = ("TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", "WORLD_SIZE") + missing = [name for name in required_env if name not in os.environ] + if missing: raise ValueError( - "Cannot find distributed environment. " + f"Cannot find distributed environment variables: {missing}. " "Please make sure you are using `torchrun --nproc-per-node ...`." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/dist_utils.py` around lines 33 - 57, The init_dist() function should validate required env vars and avoid double initialization: check that torch.distributed.is_available() and not torch.distributed.is_initialized() before calling dist.init_process_group, and explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and "WORLD_SIZE" exist in os.environ (raise clear ValueError messages if any are missing) before converting them to int; also ensure you pass a plain device index (local_rank) as device_id to dist.init_process_group and still call torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier() after successful init to keep behavior intact.src/compressed_tensors/offload/__init__.py (1)
139-152:⚠️ Potential issue | 🟡 MinorReturn type mismatch:
get_execution_devicereturnstorch.devicebutonload_devicemay be a string.At line 149, this function returns
module._parameters.onload_devicedirectly. However,onload_deviceis typed asDeviceLikeTypeinOffloadCache(which includes strings), while this function's return type istorch.device.This is related to the normalization issue in
OffloadCache.__init__. Ifonload_deviceis normalized totorch.devicein the base class constructor, this return type would be correct.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/__init__.py` around lines 139 - 152, The return type mismatch happens because get_execution_device declares torch.device but returns module._parameters.onload_device which may be a string; fix by normalizing onload_device to a torch.device in OffloadCache.__init__ (or otherwise ensuring OffloadCache.onload_device is always a torch.device) so get_execution_device can safely return module._parameters.onload_device; update OffloadCache.__init__ to convert DeviceLikeType to torch.device (using torch.device(...) or equivalent normalization) and add/adjust typing to reflect the normalized attribute.
🧹 Nitpick comments (3)
src/compressed_tensors/offload/cache/cpu.py (1)
4-4: Unused imports:LiteralandOptionalare imported but not used in this file.These imports don't appear to be used anywhere in
CPUCache. Consider removing them to keep imports clean.♻️ Suggested fix
-from typing import Literal, Optional +pass # No typing imports needed currently🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/cache/cpu.py` at line 4, The import line includes unused typing names Literal and Optional; remove those unused imports from the top-level import statement (so the typing import only contains what is actually used) in the module that defines CPUCache to clean up imports and satisfy linters; locate the import statement near the CPUCache class and delete Literal and Optional from it (or remove the entire typing import if nothing from typing is used).src/compressed_tensors/offload/dispatch.py (1)
102-104: Consider avoiding mutable/callable default argument.The static analysis tool (Ruff B008) flags
torch.device("cpu")in the default argument. While this is unlikely to cause issues for an immutable device object, it's generally better practice to useNoneas the default and construct the device inside the function.♻️ Suggested fix
+_DEFAULT_DEVICE = torch.device("cpu") + def get_device_map( - model: torch.nn.Module, default_device: DeviceLikeType = torch.device("cpu") + model: torch.nn.Module, default_device: DeviceLikeType | None = None ) -> DeviceMap: """ Get the device map of a CT-offloaded model :param: model: model to get device map of :param default_device: the default onload/offload device when module has no parameters :return: device map specifying the onload and offload device of all modules """ from compressed_tensors.offload import get_execution_device, get_offloaded_device + if default_device is None: + default_device = _DEFAULT_DEVICE + return {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/dispatch.py` around lines 102 - 104, The default argument for get_device_map currently uses torch.device("cpu") which triggers a mutable/callable default warning; change the signature to accept default_device: DeviceLikeType | None = None and inside get_device_map detect None and set default_device = torch.device("cpu") (or equivalent) before using it, updating any references to default_device accordingly to avoid the callable default; this targets the get_device_map function and its default_device parameter.src/compressed_tensors/offload/cache/device.py (1)
4-4: Unused import:TYPE_CHECKINGis no longer used.
TYPE_CHECKINGwas previously used to guard theDeviceLikeTypeimport, but now that import is unconditional. Consider removingTYPE_CHECKINGfrom the import statement.♻️ Suggested fix
-from typing import TYPE_CHECKING, Literal, Optional +from typing import Literal, Optional🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/offload/cache/device.py` at line 4, Remove the unused TYPE_CHECKING import from the typing import list: the current import line includes TYPE_CHECKING but DeviceLikeType is now imported unconditionally, so update the import to only include the actually used symbols (e.g., Literal and Optional) and eliminate TYPE_CHECKING; ensure any references to TYPE_CHECKING elsewhere in device.py are absent or removed so the import cleanup is safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/compressed_tensors/offload/cache/base.py`:
- Around line 35-36: get_execution_device's annotated return type (torch.device)
doesn't match the actual value returned (module._parameters.onload_device which
can be a string/DeviceLikeType); to fix, normalize and store a proper
torch.device in the object constructor so get_execution_device can safely return
torch.device: during initialization (where onload_device is accepted/assigned,
e.g., the constructor that sets module._parameters.onload_device and where
dispatch.py assigns module._parameters.onload_device = onload_device), convert
the incoming onload_device (string or torch.device) to torch.device (using
torch.device(onload_device) or equivalent) and assign that normalized
torch.device back to module._parameters.onload_device so get_execution_device
can keep its torch.device return type without type mismatch.
In `@src/compressed_tensors/offload/dist_utils.py`:
- Line 10: The __all__ export list is unsorted and triggers Ruff RUF022; update
the __all__ declaration in dist_utils.py to contain the public symbols sorted
alphabetically (e.g., ["init_dist", "is_distributed", "is_rank0"]) so that the
names exported by the module match Ruff's expected sorted order and stop the
lint warnings for the __all__ variable.
---
Outside diff comments:
In `@src/compressed_tensors/offload/__init__.py`:
- Around line 139-152: The return type mismatch happens because
get_execution_device declares torch.device but returns
module._parameters.onload_device which may be a string; fix by normalizing
onload_device to a torch.device in OffloadCache.__init__ (or otherwise ensuring
OffloadCache.onload_device is always a torch.device) so get_execution_device can
safely return module._parameters.onload_device; update OffloadCache.__init__ to
convert DeviceLikeType to torch.device (using torch.device(...) or equivalent
normalization) and add/adjust typing to reflect the normalized attribute.
In `@src/compressed_tensors/offload/dist_utils.py`:
- Around line 33-57: The init_dist() function should validate required env vars
and avoid double initialization: check that torch.distributed.is_available() and
not torch.distributed.is_initialized() before calling dist.init_process_group,
and explicitly verify "TORCHELASTIC_RUN_ID", "RANK", "LOCAL_RANK", and
"WORLD_SIZE" exist in os.environ (raise clear ValueError messages if any are
missing) before converting them to int; also ensure you pass a plain device
index (local_rank) as device_id to dist.init_process_group and still call
torch.cuda.set_device(torch.device(f"cuda:{local_rank}")) and dist.barrier()
after successful init to keep behavior intact.
---
Nitpick comments:
In `@src/compressed_tensors/offload/cache/cpu.py`:
- Line 4: The import line includes unused typing names Literal and Optional;
remove those unused imports from the top-level import statement (so the typing
import only contains what is actually used) in the module that defines CPUCache
to clean up imports and satisfy linters; locate the import statement near the
CPUCache class and delete Literal and Optional from it (or remove the entire
typing import if nothing from typing is used).
In `@src/compressed_tensors/offload/cache/device.py`:
- Line 4: Remove the unused TYPE_CHECKING import from the typing import list:
the current import line includes TYPE_CHECKING but DeviceLikeType is now
imported unconditionally, so update the import to only include the actually used
symbols (e.g., Literal and Optional) and eliminate TYPE_CHECKING; ensure any
references to TYPE_CHECKING elsewhere in device.py are absent or removed so the
import cleanup is safe.
In `@src/compressed_tensors/offload/dispatch.py`:
- Around line 102-104: The default argument for get_device_map currently uses
torch.device("cpu") which triggers a mutable/callable default warning; change
the signature to accept default_device: DeviceLikeType | None = None and inside
get_device_map detect None and set default_device = torch.device("cpu") (or
equivalent) before using it, updating any references to default_device
accordingly to avoid the callable default; this targets the get_device_map
function and its default_device parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 905612b0-5370-4818-a280-af6669170d17
📒 Files selected for processing (9)
src/compressed_tensors/offload/__init__.pysrc/compressed_tensors/offload/cache/base.pysrc/compressed_tensors/offload/cache/cpu.pysrc/compressed_tensors/offload/cache/device.pysrc/compressed_tensors/offload/cache/disk.pysrc/compressed_tensors/offload/convert/from_accelerate.pysrc/compressed_tensors/offload/dispatch.pysrc/compressed_tensors/offload/dist_utils.pysrc/compressed_tensors/offload/utils.py
| onload_device: DeviceLikeType | ||
| offload_device: DeviceLikeType | Literal["disk"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how onload_device is used after being stored - does any code assume it's a torch.device?
rg -n -A2 'onload_device' src/compressed_tensors/offload/ --type py | head -60Repository: vllm-project/compressed-tensors
Length of output: 4390
🏁 Script executed:
# Find where DeviceLikeType is defined and check its definition
rg -n "DeviceLikeType" src/compressed_tensors/ --type py -B2 -A2 | head -80Repository: vllm-project/compressed-tensors
Length of output: 5353
🏁 Script executed:
# Check the send_tensors function signature
rg -n "def send_tensors" src/compressed_tensors/ --type py -A5Repository: vllm-project/compressed-tensors
Length of output: 543
🏁 Script executed:
# Check the original type annotations in base.py before change
git show HEAD:src/compressed_tensors/offload/cache/base.py | head -60Repository: vllm-project/compressed-tensors
Length of output: 2316
🏁 Script executed:
# Search for the get_execution_device function mentioned in review comment
rg -n "get_execution_device" src/compressed_tensors/ --type py -B2 -A5Repository: vllm-project/compressed-tensors
Length of output: 7067
🏁 Script executed:
# Check what __init__.py line 149 actually contains (referenced in review)
rg -n "^" src/compressed_tensors/offload/__init__.py | head -200Repository: vllm-project/compressed-tensors
Length of output: 7278
🏁 Script executed:
# Check all usages of onload_device to see if any depend on it being torch.device
rg -n "\.onload_device" src/compressed_tensors/ --type py -B1 -A1Repository: vllm-project/compressed-tensors
Length of output: 2514
🏁 Script executed:
# Check torch.device constructor to confirm it accepts strings
python3 << 'EOF'
import torch
# Verify that torch.device can be called with string
try:
d = torch.device("cuda:0")
print(f"torch.device('cuda:0') works: {d}")
except Exception as e:
print(f"Error: {e}")
EOFRepository: vllm-project/compressed-tensors
Length of output: 192
🏁 Script executed:
# Check what AlignDevicesHook expects for execution_device
rg -n "AlignDevicesHook" src/compressed_tensors/ --type py -B2 -A10Repository: vllm-project/compressed-tensors
Length of output: 6674
Type annotation mismatch: get_execution_device() return type is incorrect
The function get_execution_device() declares a return type of torch.device (line 141 in __init__.py), but it directly returns module._parameters.onload_device which is typed as DeviceLikeType and can be a string. In dispatch.py (line 56), strings are assigned directly: module._parameters.onload_device = onload_device where the parameter accepts torch.device | str.
While the code works in practice because callers accept DeviceLikeType, the type contract is violated. Normalizing to torch.device in the constructor resolves this:
🔧 Suggested fix
def __init__(
self,
onload_device: torch.device | str,
offload_device: torch.device | str | Literal["disk"] | None = None,
):
super().__init__()
- self.onload_device = onload_device
+ self.onload_device = torch.device(onload_device)
self.offloaded_values = dict()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/compressed_tensors/offload/cache/base.py` around lines 35 - 36,
get_execution_device's annotated return type (torch.device) doesn't match the
actual value returned (module._parameters.onload_device which can be a
string/DeviceLikeType); to fix, normalize and store a proper torch.device in the
object constructor so get_execution_device can safely return torch.device:
during initialization (where onload_device is accepted/assigned, e.g., the
constructor that sets module._parameters.onload_device and where dispatch.py
assigns module._parameters.onload_device = onload_device), convert the incoming
onload_device (string or torch.device) to torch.device (using
torch.device(onload_device) or equivalent) and assign that normalized
torch.device back to module._parameters.onload_device so get_execution_device
can keep its torch.device return type without type mismatch.
|
|
||
|
|
||
| __all__ = ["is_distributed", "is_rank0"] | ||
| __all__ = ["is_distributed", "is_rank0", "init_dist"] |
There was a problem hiding this comment.
Sort __all__ to satisfy Ruff RUF022.
Line 10 is unsorted and will keep triggering lint warnings.
Suggested fix
-__all__ = ["is_distributed", "is_rank0", "init_dist"]
+__all__ = ["init_dist", "is_distributed", "is_rank0"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = ["is_distributed", "is_rank0", "init_dist"] | |
| __all__ = ["init_dist", "is_distributed", "is_rank0"] |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 10-10: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/compressed_tensors/offload/dist_utils.py` at line 10, The __all__ export
list is unsorted and triggers Ruff RUF022; update the __all__ declaration in
dist_utils.py to contain the public symbols sorted alphabetically (e.g.,
["init_dist", "is_distributed", "is_rank0"]) so that the names exported by the
module match Ruff's expected sorted order and stop the lint warnings for the
__all__ variable.
|
This pull request has merge conflicts that must be resolved before it can be |
|
@coderabbitai resolve merge conflict |
Resolved conflicts in: - src/compressed_tensors/offload/cache/base.py (content) - src/compressed_tensors/offload/cache/disk.py (content) - src/compressed_tensors/offload/dispatch.py (content) - src/compressed_tensors/offload/dist_utils.py (content) - src/compressed_tensors/offload/utils.py (content) Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
|
The quality checks have failed. Please run |
Purpose
"cuda","cpu", and0Prerequisites
Summary by CodeRabbit
Release Notes
New Features
Refactor
Documentation