Skip to content

Commit 5b04938

Browse files
Logos planner: drop base_residency_mb floor on sleeping-lane freed_total — empirically verified on vLLM 0.20.0 that --query-compute-apps reports physical residual within ~30 MB of device-level after sleep_l1, so the 10× over-estimate that made stop attractive for tiny deficits is wrong
1 parent 4085509 commit 5b04938

1 file changed

Lines changed: 13 additions & 7 deletions

File tree

logos/src/logos/capacity/capacity_planner.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1471,16 +1471,22 @@ def __init__(self, lane, action, eff_demand, freed_per_gpu):
14711471
if in_cooldown:
14721472
continue
14731473
action = "stop"
1474+
# Stopping a sleeping vLLM lane frees only the small residual
1475+
# that CuMemAllocator + sleep_l1 leaves on the GPU — typically
1476+
# ~750 MB per GPU for a 7-14B AWQ model. Earlier versions of
1477+
# this function floored freed_total at base_residency_mb (full
1478+
# awake VRAM) on the assumption that PyTorch's caching pool
1479+
# retained model weights invisible to --query-compute-apps.
1480+
# Verified empirically on vLLM 0.20.0: --query-compute-apps and
1481+
# --query-gpu=memory.used agree within ~30 MB per GPU after
1482+
# sleep_l1, so the per-process measurement is trustworthy and
1483+
# the floor was a 10× over-estimate that made `stop` look
1484+
# attractive for tiny deficits. Trust effective_vram_mb (or
1485+
# the calibrated sleeping_residual_mb when no measurement is
1486+
# available yet).
14741487
residual_mb = float(lane.effective_vram_mb or 0.0)
14751488
if residual_mb <= 0 and profile:
14761489
residual_mb = float(profile.sleeping_residual_mb or 0.0)
1477-
# Sleeping vLLM lanes underreport GPU usage via --query-compute-apps:
1478-
# the CUDA allocator keeps model weights in its pool, invisible to
1479-
# per-process queries. Use profile base_residency as a floor.
1480-
if lane.is_vllm and profile is not None:
1481-
base_residency = float(getattr(profile, "base_residency_mb", 0) or 0)
1482-
if base_residency > residual_mb:
1483-
residual_mb = base_residency
14841490
freed_total = residual_mb
14851491
else:
14861492
continue # busy, cold, stopped, or starting — not evictable

0 commit comments

Comments
 (0)