Skip to content

Address possible memory leak during model sleep/unload#2030

Open
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-memory-leak-unload-v4
Open

Address possible memory leak during model sleep/unload#2030
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-memory-leak-unload-v4

Conversation

@glaziermag
Copy link
Copy Markdown
Contributor

Tentatively addressing issue #545.

It appears that the VRAM memory leak during /v1/sleep and model unload operations might be related to asynchronous CUDA context bindings executing after the OS thread has shut down or detached from the device pool. By synchronizing the engine_handler thread and explicitly pre-binding the candle_core context to the local HTTP thread before the unmapped Rust pointers evaluate their destructors, the driver seems to successfully purge the PagedAttention memory pools back to OS limits.

Still testing the broader implications of this driver context synchronization, so I'm open to feedback on whether this is the optimal approach for the architecture.

Diagnostic Traces (L4 Instance)

Before synchronization:

Sending POST to /v1/sleep...
{"model_id":"HuggingFaceTB/SmolLM-135M","status":"unloaded"}
20239 MiB

(Driver executes pointer drops synchronously but execution appears to drop unmapped, leaving memory allocated.)

After synchronization:

Sending POST to /v1/sleep...
{"model_id":"HuggingFaceTB/SmolLM-135M","status":"unloaded"}
271 MiB

(Memory returns to the intrinsic context reservation block, clearing the dynamically loaded KV cache).

@glaziermag glaziermag marked this pull request as ready for review March 25, 2026 23:00
@glaziermag glaziermag changed the title Draft: Address possible memory leak during model sleep/unload Address possible memory leak during model sleep/unload Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant