Efficient LLM inference on Slurm clusters.
-
Updated
Apr 10, 2026 - Python
Efficient LLM inference on Slurm clusters.
eLLM Infers LLM on CPUs in Real Time
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.
Add a description, image, and links to the llm-infernece topic page so that developers can more easily learn about it.
To associate your repository with the llm-infernece topic, visit your repo's landing page and select "manage topics."