llm-infernece

Here are 4 public repositories matching this topic...

VectorInstitute / vector-inference

Efficient LLM inference on Slurm clusters.

inference speech-to-text vlm text-embedding multimodal audio-transcription llm vllm reward-model llm-infernece sglang llm-infrastructure

Updated Apr 10, 2026
Python

pandada8 / llm-inference-benchmark

Star

LLM 推理服务性能测试

llm-infernece

Updated Dec 17, 2023
Jupyter Notebook

lucienhuangfu / eLLM

Star

eLLM Infers LLM on CPUs in Real Time

cpu-inference deep-thinking llm-infernece deep-research qwen3 context-engineering rust-llm

Updated Apr 13, 2026
Rust

Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.

inference moe llm llm-serving vllm llm-infernece

Updated Mar 9, 2026
Python

Improve this page

Add a description, image, and links to the llm-infernece topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-infernece topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-infernece

Here are 4 public repositories matching this topic...

VectorInstitute / vector-inference

pandada8 / llm-inference-benchmark

lucienhuangfu / eLLM

scale-snu / layered-prefill

Improve this page

Add this topic to your repo