diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index c4828141270..65cf1486ab1 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -1,28 +1,26 @@
-.github @merrymercy @Fridge003 @ispobock
-/docker @Fridge003 @ispobock @HaiShaw @ByronHsu
-/docker/npu.Dockerfile @ping1jing2
+.github @merrymercy @Fridge003 @ispobock @Kangyan-Zhou
+/docker @Fridge003 @ispobock @HaiShaw @ishandhanani
+/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
-/python/sglang/* @merrymercy @Ying1123 @Fridge003 @ispobock @hnyls2002
/python/sglang/multimodal_gen @mickqian
-/python/sglang/srt/constrained @hnyls2002
-/python/sglang/srt/disaggregation @ByronHsu @hnyls2002
-/python/sglang/srt/disaggregation/mooncake @ShangmingCai
-/python/sglang/srt/disaggregation/ascend @ping1jing2
-/python/sglang/srt/distributed @yizhang2077 @merrymercy
+/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
+/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
+/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
+/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
-/python/sglang/srt/eplb @fzyzcjy
+/python/sglang/srt/eplb @fzyzcjy @ch-wan
/python/sglang/srt/function_call @CatherineSue @JustinTong0323
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
-/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg
-/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2
+/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ
+/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang
-/python/sglang/srt/managers @merrymercy @Ying1123 @zhyncs @hnyls2002 @xiezhq-hermann
+/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @zhyncs
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
-/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2
+/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
-/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2
-/python/sglang/srt/multimodal @mickqian @JustinTong0323
-/python/sglang/srt/speculative @Ying1123 @merrymercy @kssteven418
+/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2 @iforgetmyname
+/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201
+/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
/sgl-kernel @zhyncs @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
/sgl-router @slin1237 @ByronHsu @CatherineSue
/sgl-router/benches @slin1237
@@ -40,5 +38,5 @@
/sgl-router/src/routers @CatherineSue @key4ng @slin1237
/sgl-router/src/tokenizer @slin1237 @CatherineSue
/sgl-router/src/tool_parser @slin1237 @CatherineSue
+/test/srt/ascend @ping1jing2 @iforgetmyname
/test/srt/test_modelopt* @Edwardf0t1
-/test/srt/ascend @ping1jing2
diff --git a/.github/ISSUE_TEMPLATE/1-bug-report.yml b/.github/ISSUE_TEMPLATE/1-bug-report.yml
index 5f6734867ca..6e3d9a83b47 100644
--- a/.github/ISSUE_TEMPLATE/1-bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/1-bug-report.yml
@@ -1,5 +1,5 @@
name: 🐞 Bug report
-description: Create a report to help us reproduce and fix the bug
+description: Report a bug to help us reproduce and fix it.
title: "[Bug] "
labels: ['Bug']
@@ -8,31 +8,28 @@ body:
attributes:
label: Checklist
options:
- - label: 1. I have searched related issues but cannot get the expected help.
- - label: 2. The bug has not been fixed in the latest version.
- - label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- - label: 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- - label: 5. Please use English, otherwise it will be closed.
+ - label: I searched related issues but found no solution.
+ - label: The bug persists in the latest version.
+ - label: Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
+ - label: If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
+ - label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Describe the bug
- description: A clear and concise description of what the bug is.
+ description: A clear, concise description of the bug.
validations:
required: true
- type: textarea
attributes:
label: Reproduction
- description: |
- What command or script did you run? Which **model** are you using?
- placeholder: |
- A placeholder for the command.
+ description: Command/script run and model used.
+ placeholder: Paste the command here.
validations:
required: true
- type: textarea
attributes:
label: Environment
- description: |
- Please provide necessary environment information here with `python3 -m sglang.check_env`. Otherwise the issue will be closed.
- placeholder: Environment here.
+ description: Run `python3 -m sglang.check_env` and paste output here. Issues without this will be closed.
+ placeholder: Paste environment output here.
validations:
required: true
diff --git a/.github/ISSUE_TEMPLATE/2-feature-request.yml b/.github/ISSUE_TEMPLATE/2-feature-request.yml
index 31bc4a127e6..99f1f4d5ed1 100644
--- a/.github/ISSUE_TEMPLATE/2-feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/2-feature-request.yml
@@ -7,17 +7,17 @@ body:
attributes:
label: Checklist
options:
- - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- - label: 2. Please use English, otherwise it will be closed.
+ - label: If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
+ - label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Motivation
description: |
- A clear and concise description of the motivation of the feature.
+ Clearly and concisely describe the feature's motivation.
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
- If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
+ Provide official releases or third-party implementations if available.
diff --git a/.github/MAINTAINER.md b/.github/MAINTAINER.md
new file mode 100644
index 00000000000..09fca2edf1d
--- /dev/null
+++ b/.github/MAINTAINER.md
@@ -0,0 +1,40 @@
+# SGLang Code Maintenance Model
+This documentation describes the code maintenance model for the SGLang project.
+Since SGLang is a large project involving multiple organizations and hardware platforms, we designed this model with the following goals:
+- Ensure a responsive and smooth review process.
+- Allow fast iteration, so maintainers can sometimes bypass flaky CI tests for important PRs.
+
+## Role Descriptions
+There are four important roles in this maintenance model. Some are custom roles, while the others are roles predefined by GitHub.
+
+- *Area Maintainer*: The person who drives the PR merge process. They have strong area expertise and hold a high code quality bar.
+ - Permission: Merge PRs. Bypass branch protection rules if needed.
+ - Responsibility: Shepherd the merge of PRs assigned to your area. Revert or hotfix any issues related to your merge (especially if you bypass).
+- *Codeowner*: The person who protects critical code. Without bypass, each PR needs at least one Codeowner approval for each modified file. Please note that this role is not an honor but a great responsibility because PRs cannot be merged without your approval (except when bypassed by an Area Maintainer).
+ - Permission: Approve PRs, allowing them to be merged without bypass.
+ - Responsibility: Review PRs in a timely manner.
+- *Write*: The person with write permission to the sglang repo.
+ - Permission: Merge PRs if they have passed required tests and been approved by Codeowners. This role cannot bypass branch protection rules.
+ - Responsibility: Review and merge PRs in a timely manner.
+- *CI Maintainer*: The person who manages CI runners for specific hardware platforms.
+ - Permission: Add CI runners.
+ - Responsibility: Keep the CI runners up and running.
+
+Note: Difference between Area Maintainer and Codeowner
+- Area Maintainer is an active role who actively try to help merge PRs and can bypass CI if urgent.
+- Codeowner is a passive protection rule provided by GitHub; it prevents accidental changes to critical code.
+- The list of Area Maintainer is attached below. The list of Codeowner is in [CODEOWNERS](./CODEOWNERS) file.
+
+## Pull Request Merge Process
+1. Author submits a pull request (PR) and fills out the PR checklist.
+2. A bot assigns this PR to an Area Maintainer and @-mention them. At the same time, GitHub will auto-request reviews from Codeowners.
+3. The Area Maintainer coordinates the review (asking people to review) and approves the PR; the Codeowner approves the PR.
+4. We can now merge the code:
+ - Ideal case: For each modified file, one Codeowner approves the PR. It also passes the required CI. Then anyone with write permission can merge the code.
+ - In cases where it is difficult to meet all requirements due to flaky CI or slow responses, an Area Maintainer can bypass branch protection to merge the PR.
+
+## The List of Area Maintainers and Reviewers
+TODO
+
+## The List of CI Maintainers
+TODO
diff --git a/.github/REVIEWERS.md b/.github/REVIEWERS.md
deleted file mode 100644
index 8cfec67cee1..00000000000
--- a/.github/REVIEWERS.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Area Reviewer
-
-Here are some reviewers for common areas. You can ping them to review your code if you touch related parts.
-
-## Hardware platforms
-- general @Alcanderian
-- AMD GPU @HaiShaw
-- Blackwell GPU @kushanam @trevor-m @zhyncs
-- CPU @mingfeima
-
-## Kernel
-- general @zhyncs @ispobock @HandH1998 @BBuf @yizhang2077 @HaiShaw
-- triton attention backend @ispobock
-- aiter attention backend @HaiShaw @kkHuang-amd @valarLip
-- flash attention backend @hebiao064
-- flashinfer attention backend @Fridge003
-- moe kernel @BBuf @fzyzcjy @ch-wan @Alcanderian
-- quantization @FlamingoPg @HandH1998
-
-## Scheduler and memory pool
-- general @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
-- constrained decoding @hnyls2002
-- hierarchical cache @xiezhq-hermann @DarkSharpness
-- lora @Fridge003 @Ying1123 @lifuhuang
-- speculative decoding @merrymercy @Ying1123 @kssteven418 @Qiaolin-Yu
-- sliding window attention @hanming-lu
-
-## Parallelism
-- expert parallelism @fzyzcjy @ch-wan
-- data parallelism attention @ch-wan
-- pipeline parallelism @Ying1123
-- tensor parallelism @merrymercy
-
-## PD disaggregation
-- general @ByronHsu @ShangmingCai @hnyls2002
-- Mooncake backend @ShangmingCai
-
-## Build and release
-- general @zhyncs @merrymercy
-
-## API Server
-- general @CatherineSue @slin1237 @ispobock
-- function calling and reasoning parsing @CatherineSue
-- OpenAI API @CatherineSue @slin1237
-
-## SGL-Router
-- general @slin1237 @ByronHsu
-
-## Model
-- multimodal models @mickqian @JustinTong0323
-- other new models @zhaochenyang20
-
-## Reinforcment learning
-- general @zhaochenyang20 @hebiao064 @fzyzcjy @zhuzilin
diff --git a/.github/labeler.yml b/.github/labeler.yml
index e9da82eb50f..b4ec2a42429 100644
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -26,7 +26,7 @@ dependencies:
- '**/requirements*.txt'
- '**/Cargo.toml'
- '**/Cargo.lock'
- - '**/pyproject.toml'
+ - '**/pyproject*.toml'
- '**/setup.py'
- '**/poetry.lock'
- '**/package.json'
@@ -36,31 +36,27 @@ dependencies:
Multi-modal:
- changed-files:
- any-glob-to-any-file:
- - '**/vision/**/*'
- - '**/multimodal/**/*'
- - '**/vlm/**/*'
+ - '**/*multimodal*'
+ - '**/*vision*'
+ - '**/*vlm*'
# LoRA
lora:
- changed-files:
- any-glob-to-any-file:
- - '**/lora/**/*'
- '**/*lora*'
# Quantization
quant:
- changed-files:
- any-glob-to-any-file:
- - '**/quant/**/*'
- '**/*quant*'
- - '**/awq/**/*'
- - '**/gptq/**/*'
+ - '**/*quantization*'
# Speculative decoding
speculative-decoding:
- changed-files:
- any-glob-to-any-file:
- - '**/speculative/**/*'
- '**/*speculative*'
# AMD specific
@@ -80,5 +76,4 @@ deepseek:
hicache:
- changed-files:
- any-glob-to-any-file:
- - '**/hicache/**/*'
- '**/*hicache*'
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
index 565984700c1..51ca181e4e0 100644
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -9,6 +9,7 @@ on:
jobs:
lint:
runs-on: ubuntu-latest
+ if: contains(github.event.pull_request.labels.*.name, 'run-ci')
steps:
- uses: actions/checkout@v4
diff --git a/README.md b/README.md
index 327336de0cb..e8a41d1377f 100644
--- a/README.md
+++ b/README.md
@@ -55,10 +55,10 @@ It is designed to deliver low-latency and high-throughput inference across a wid
Its core features include:
- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
-- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), and reward models (Skywork), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
+- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
- **Extensive Hardware Support**: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, supporting chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
-- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 300,000 GPUs worldwide.
+- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.
## Getting Started
- [Install SGLang](https://docs.sglang.ai/get_started/install.html)
@@ -68,13 +68,11 @@ Its core features include:
- [Contribution Guide](https://docs.sglang.ai/developer_guide/contribution_guide.html)
## Benchmark and Performance
-Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/).
-
-## Roadmap
-[Development Roadmap (2025 Q4)](https://github.com/sgl-project/sglang/issues/12780)
+Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/), [GB200 rack-scale parallelism](https://lmsys.org/blog/2025-09-25-gb200-part-2/).
## Adoption and Sponsorship
-SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 300,000 GPUs worldwide.
+SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia.
+As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 400,000 GPUs worldwide.
SGLang is currently hosted under the non-profit open-source organization [LMSYS](https://lmsys.org/about/).
diff --git a/docs/advanced_features/hicache.rst b/docs/advanced_features/hicache.rst
index 2a9f28e210b..b2bd08b79e7 100644
--- a/docs/advanced_features/hicache.rst
+++ b/docs/advanced_features/hicache.rst
@@ -1,5 +1,5 @@
Hierarchical KV Caching (HiCache)
-======================
+=================================
.. toctree::
:maxdepth: 1
diff --git a/docs/advanced_features/observability.md b/docs/advanced_features/observability.md
index f03fb3772a7..9c5d2e17534 100644
--- a/docs/advanced_features/observability.md
+++ b/docs/advanced_features/observability.md
@@ -7,7 +7,7 @@ You can query them by:
curl http://localhost:30000/metrics
```
-See [Production Metrics](../references/production_metrics.md) for more details.
+See [Production Metrics](../references/production_metrics.md) and [Production Request Tracing](../references/production_request_trace.md) for more details.
## Logging
diff --git a/docs/advanced_features/pd_multiplexing.md b/docs/advanced_features/pd_multiplexing.md
deleted file mode 100644
index 9aecd70cdb8..00000000000
--- a/docs/advanced_features/pd_multiplexing.md
+++ /dev/null
@@ -1,56 +0,0 @@
-
-# PD Multiplexing
-
-
-## Server Arguments
-
-| Argument | Type/Default | Description |
-|-----------------------------|-------------------------|----------------------------------------------------------|
-| `--enable-pdmux` | flag; default: disabled | Enable PD-Multiplexing (PD running on greenctx stream). |
-| `--pdmux-config-path `| string path; none | Path to the PD-Multiplexing YAML config file. |
-
-### YAML Configuration
-
-Example configuration for an H200 (132 SMs)
-
-```yaml
-# Number of SM groups to divide the GPU into.
-# Includes two default groups:
-# - Group 0: all SMs for prefill
-# - Last group: all SMs for decode
-# The number of manual divisions must be (sm_group_num - 2).
-sm_group_num: 8
-
-# Optional manual divisions of SMs.
-# Each entry contains:
-# - prefill_sm: number of SMs allocated for prefill
-# - decode_sm: number of SMs allocated for decode
-# - decode_bs_threshold: minimum decode batch size to select this group
-#
-# The sum of `prefill_sm` and `decode_sm` must equal the total number of SMs.
-# If provided, the number of entries must equal (sm_group_num - 2).
-manual_divisions:
- - [112, 20, 1]
- - [104, 28, 5]
- - [96, 36, 10]
- - [80, 52, 15]
- - [64, 68, 20]
- - [56, 76, 25]
-
-# Divisor for default stream index calculation.
-# Used when manual_divisions are not provided.
-# Formula:
-# stream_idx = max(
-# 1,
-# min(sm_group_num - 2,
-# decode_bs * (sm_group_num - 2) // decode_bs_divisor
-# )
-# )
-decode_bs_divisor: 36
-
-# Maximum token budget for split_forward in the prefill stage.
-# Determines how many layers are executed per split_forward.
-# Formula:
-# forward_count = max(1, split_forward_token_budget // extend_num_tokens)
-split_forward_token_budget: 65536
-```
diff --git a/docs/basic_usage/deepseek.md b/docs/basic_usage/deepseek_v3.md
similarity index 99%
rename from docs/basic_usage/deepseek.md
rename to docs/basic_usage/deepseek_v3.md
index 128897a7e13..b364c733fce 100644
--- a/docs/basic_usage/deepseek.md
+++ b/docs/basic_usage/deepseek_v3.md
@@ -1,4 +1,4 @@
-# DeepSeek Usage
+# DeepSeek V3/V3.1/R1 Usage
SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
diff --git a/docs/basic_usage/popular_model_usage.rst b/docs/basic_usage/popular_model_usage.rst
new file mode 100644
index 00000000000..2f8753fda50
--- /dev/null
+++ b/docs/basic_usage/popular_model_usage.rst
@@ -0,0 +1,12 @@
+Popular Model Usage (DeepSeeek, GPT-OSS, Llama, Qwen, and more)
+===============================================================
+
+.. toctree::
+ :maxdepth: 1
+
+ deepseek_v3.md
+ deepseek_v32.md
+ gpt_oss.md
+ llama4.md
+ qwen3.md
+ qwen3_vl.md
diff --git a/docs/developer_guide/bench_serving.md b/docs/developer_guide/bench_serving.md
index 28b7a93cd5d..b2f8568e260 100644
--- a/docs/developer_guide/bench_serving.md
+++ b/docs/developer_guide/bench_serving.md
@@ -1,4 +1,4 @@
-## Bench Serving Guide
+# Bench Serving Guide
This guide explains how to benchmark online serving throughput and latency using `python -m sglang.bench_serving`. It supports multiple inference backends via OpenAI-compatible and native endpoints, and produces both console metrics and optional JSONL outputs.
diff --git a/docs/get_started/install.md b/docs/get_started/install.md
index f580f31ef40..f7021e2a394 100644
--- a/docs/get_started/install.md
+++ b/docs/get_started/install.md
@@ -3,7 +3,7 @@
You can install SGLang using one of the methods below.
This page primarily applies to common NVIDIA GPU platforms.
-For other or newer platforms, please refer to the dedicated pages for [AMD GPUs](../platforms/amd_gpu.md), [Intel Xeon CPUs](../platforms/cpu_server.md), [TPU](../platforms/tpu.md), [NVIDIA DGX Spark](https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/), [NVIDIA Jetson](../platforms/nvidia_jetson.md), [Ascend NPUs](../platforms/ascend_npu.md).
+For other or newer platforms, please refer to the dedicated pages for [AMD GPUs](../platforms/amd_gpu.md), [Intel Xeon CPUs](../platforms/cpu_server.md), [TPU](../platforms/tpu.md), [NVIDIA DGX Spark](https://lmsys.org/blog/2025-11-03-gpt-oss-on-nvidia-dgx-spark/), [NVIDIA Jetson](../platforms/nvidia_jetson.md), [Ascend NPUs](../platforms/ascend_npu.md), and [Intel XPU](../platforms/xpu.md).
## Method 1: With pip or uv
@@ -35,7 +35,7 @@ pip install -e "python"
**Quick fixes to common problems**
-- If you want to develop SGLang, it is recommended to use docker. Please refer to [setup docker container](../developer_guide/development_guide_using_docker.md#setup-docker-container). The docker image is `lmsysorg/sglang:dev`.
+- If you want to develop SGLang, you can try the dev docker image. Please refer to [setup docker container](../developer_guide/development_guide_using_docker.md#setup-docker-container). The docker image is `lmsysorg/sglang:dev`.
## Method 3: Using docker
diff --git a/docs/index.rst b/docs/index.rst
index 07e12815587..bf457abe966 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -6,10 +6,10 @@ It is designed to deliver low-latency and high-throughput inference across a wid
Its core features include:
- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
-- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), and reward models (Skywork), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
+- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
- **Extensive Hardware Support**: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, supporting chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
-- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 300,000 GPUs worldwide.
+- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.
.. toctree::
:maxdepth: 1
@@ -26,12 +26,7 @@ Its core features include:
basic_usage/offline_engine_api.ipynb
basic_usage/native_api.ipynb
basic_usage/sampling_params.md
- basic_usage/deepseek.md
- basic_usage/deepseek_v32.md
- basic_usage/gpt_oss.md
- basic_usage/llama4.md
- basic_usage/qwen3.md
- basic_usage/qwen3_vl.md
+ basic_usage/popular_model_usage.rst
.. toctree::
:maxdepth: 1
@@ -74,7 +69,6 @@ Its core features include:
:caption: Hardware Platforms
platforms/amd_gpu.md
- platforms/blackwell_gpu.md
platforms/cpu_server.md
platforms/tpu.md
platforms/nvidia_jetson.md
diff --git a/docs/references/faq.md b/docs/references/faq.md
index fae54e1b5ef..ffa1a7c54fd 100644
--- a/docs/references/faq.md
+++ b/docs/references/faq.md
@@ -21,7 +21,7 @@ This error may result from kernel errors or out-of-memory issues:
- If the server hangs during initialization or running, it can be memory issues (out of memory), network issues (nccl errors), or other bugs in sglang.
- If it is out of memory, you might see that `avail mem` is very low during the initialization or right after initialization. In this case,
you can try to decrease `--mem-fraction-static`, decrease `--cuda-graph-max-bs`, or decrease `--chunked-prefill-size`.
-- Other bugs, please raise a Github issue to us.
+- Other bugs, please file an issue on GitHub.
## Frequently Asked Questions
@@ -34,6 +34,6 @@ From our initial investigation, this indeterminism arises from two factors: dyna
To achieve more deterministic outputs in the current code, you can add `--disable-radix-cache` and send only one request at a time. The results will be mostly deterministic under this setting.
-**Note**:
-Recently, we also introduced a deterministic mode, you can enable it with `--enable-deterministic-inference`. It might not work for all cases.
+**Update**:
+Recently, we also introduced a deterministic mode, you can enable it with `--enable-deterministic-inference`.
Please find more details in this blog post: https://lmsys.org/blog/2025-09-22-sglang-deterministic/
diff --git a/docs/references/learn_more.md b/docs/references/learn_more.md
index b1a8a17da62..c04bea5d4c8 100644
--- a/docs/references/learn_more.md
+++ b/docs/references/learn_more.md
@@ -1,7 +1,8 @@
-# Learn more
+# Learn More and Join the Community
-You can find more blogs, slides, and videos about SGLang at [https://github.com/sgl-project/sgl-learning-materials](https://github.com/sgl-project/sgl-learning-materials).
-
-The latest SGLang features and updates are shared through the [LMSYS blog](https://lmsys.org/blog/).
-
-The 2025 H2 roadmap can be found at this [issue](https://github.com/sgl-project/sglang/issues/7736).
+- The development roadmap: [2025 Q4](https://github.com/sgl-project/sglang/issues/12780)
+- The latest SGLang features and updates are shared through the [LMSYS blog](https://lmsys.org/blog/)
+- X (formerly Twitter): https://x.com/lmsysorg
+- LinkedIn: https://www.linkedin.com/company/sgl-project/
+- Join Slack: https://slack.sglang.ai/
+- More blogs, slides, and videos about SGLang at [https://github.com/sgl-project/sgl-learning-materials](https://github.com/sgl-project/sgl-learning-materials)
diff --git a/docs/references/production_request_trace.md b/docs/references/production_request_trace.md
index 93950503e1c..2d19570c215 100644
--- a/docs/references/production_request_trace.md
+++ b/docs/references/production_request_trace.md
@@ -1,3 +1,5 @@
+# Production Request Tracing
+
SGlang exports request trace data based on the OpenTelemetry Collector. You can enable tracing by adding the `--enable-trace` and configure the OpenTelemetry Collector endpoint using `--otlp-traces-endpoint` when launching the server.
You can find example screenshots of the visualization in https://github.com/sgl-project/sglang/issues/8965.
diff --git a/docs/security/acknowledgements.md b/docs/security/acknowledgements.md
deleted file mode 100644
index 9873da9b581..00000000000
--- a/docs/security/acknowledgements.md
+++ /dev/null
@@ -1,3 +0,0 @@
-| Time | CVE ID | Credit to | Affected Versions | Severity | Impact | Description |
-|------------|--------------|------------------|---------------------------|------------|----------------------|-------------|
-| 2025-09-09 | CVE-2025-10164 | Simon Huang, pjf | ≥ 0.4.6 & ≤ 0.5.3 | Critical | Remote Code Execution | A security flaw exists in lmsys sglang versions ≥ 0.4.6 and ≤ 0.5.3. The vulnerability arises from the use of unsafe pickle deserialization of the `serialized_named_tensors` parameter in the `/update_weights_from_tensor` API endpoint, which could allow a remote attacker to execute arbitrary code on the server by sending a specially crafted payload. |