sgl-docs/index.mdx at main · sgl-project/sgl-docs

title

Welcome to SGLang

description

High-performance serving framework for large language and multimodal models.

keywords

sglang

llm serving

multimodal

inference runtime

mode

wide

<a class="github-button" href="https://github.com/sgl-project/sglang" data-size="large" data-show-count="true" aria-label="Star sgl-project/sglang on GitHub"

Star <a class="github-button" href="https://github.com/sgl-project/sglang/fork" data-icon="octicon-repo-forked" data-size="large" data-show-count="true" aria-label="Fork sgl-project/sglang on GitHub"

Fork

Designed for low-latency, high-throughput inference with RadixAttention, prefix caching, and multi-GPU parallelism. Broad support for Llama, Qwen, DeepSeek, and more. Compatible with Hugging Face and OpenAI APIs. Native support across Hardware Platforms including NVIDIA, AMD, Intel Xeon, Google TPU, and Ascend NPU accelerators. Open-source with widespread adoption, powering 400k+ GPUs and integrated with major RL frameworks.

SGLang powers large-scale production deployments, generating trillions of tokens each day across more than 400,000 GPUs worldwide. It is hosted under the non-profit open-source organization LMSYS.

Get Started

SGLang is an inference framework meant for production level serving. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.

Install SGLang with pip, from source, or via Docker on your preferred hardware platform. Launch your first model server and send requests in minutes with OpenAI-compatible APIs.

News and latest blogs

{/* BEGIN_LMSYS_SGLANG_BLOG_CARDS */}

{"DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles"}

{"April 25, 2026"}

{"HiSparse: Turbocharging Sparse Attention with Hierarchical Memory"}

{"April 10, 2026"}

{"Highlights of SGLang at NVIDIA GTC 2026"}

{"March 31, 2026"}

{"Elastic EP in SGLang: Achieving Partial Failure Tolerance for DeepSeek MoE Deployments"}

{"March 25, 2026"}

$ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct\u2122 GPUs$

{"ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct\u2122 GPUs"}

{"March 17, 2026"}

{"SGLang Adds Day-0 Support for NVIDIA Nemotron 3 Super for building High-Efficiency Multi-Agent Systems"}

{"March 11, 2026"}

{/* END_LMSYS_SGLANG_BLOG_CARDS */}

Learn more and join the community

Stay connected

{" "} Development roadmap to follow current priorities and upcoming work.

{" "} Weekly public development meeting to hear updates and join open discussions.

{" "} Slack for questions, feedback, and community support.

X Twitter and {" "} LinkedIn for project updates.

{" "} LMSYS blog for release notes, benchmarks, and technical deep dives.

{" "} Learning materials for blogs, slides, and videos.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get Started

News and latest blogs

Learn more and join the community

FilesExpand file tree

index.mdx

Latest commit

History

index.mdx

File metadata and controls

Get Started

News and latest blogs

Learn more and join the community