sgl-project · merrymercy · Nov 17, 2025 · Nov 16, 2025 · Nov 16, 2025 · Nov 16, 2025
@@ -1,28 +1,26 @@
-.github @merrymercy @Fridge003 @ispobock
-/docker @Fridge003 @ispobock @HaiShaw @ByronHsu
-/docker/npu.Dockerfile @ping1jing2
+.github @merrymercy @Fridge003 @ispobock @Kangyan-Zhou
+/docker @Fridge003 @ispobock @HaiShaw @ishandhanani
+/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
 /python/pyproject.toml @merrymercy @Fridge003 @ispobock
-/python/sglang/* @merrymercy @Ying1123 @Fridge003 @ispobock @hnyls2002
 /python/sglang/multimodal_gen @mickqian
-/python/sglang/srt/constrained @hnyls2002
-/python/sglang/srt/disaggregation @ByronHsu @hnyls2002
-/python/sglang/srt/disaggregation/mooncake @ShangmingCai
-/python/sglang/srt/disaggregation/ascend @ping1jing2
-/python/sglang/srt/distributed @yizhang2077 @merrymercy
+/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
+/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
+/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
+/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
 /python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
-/python/sglang/srt/eplb @fzyzcjy
+/python/sglang/srt/eplb @fzyzcjy @ch-wan
 /python/sglang/srt/function_call @CatherineSue @JustinTong0323
 /python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
-/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg
-/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2
+/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ
+/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2 @iforgetmyname
 /python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang
-/python/sglang/srt/managers @merrymercy @Ying1123 @zhyncs @hnyls2002 @xiezhq-hermann
+/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @zhyncs
 /python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
-/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2
+/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2 @iforgetmyname
 /python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
-/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2
-/python/sglang/srt/multimodal @mickqian @JustinTong0323
-/python/sglang/srt/speculative @Ying1123 @merrymercy @kssteven418
+/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2 @iforgetmyname
+/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201
+/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
 /sgl-kernel @zhyncs @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
 /sgl-router @slin1237 @ByronHsu @CatherineSue
 /sgl-router/benches @slin1237
@@ -40,5 +38,5 @@
 /sgl-router/src/routers @CatherineSue @key4ng @slin1237
 /sgl-router/src/tokenizer @slin1237 @CatherineSue
 /sgl-router/src/tool_parser @slin1237 @CatherineSue
+/test/srt/ascend @ping1jing2 @iforgetmyname
 /test/srt/test_modelopt* @Edwardf0t1
-/test/srt/ascend @ping1jing2
@@ -1,5 +1,5 @@
 name: 🐞 Bug report
-description: Create a report to help us reproduce and fix the bug
+description: Report a bug to help us reproduce and fix it.
 title: "[Bug] "
 labels: ['Bug']
 
@@ -8,31 +8,28 @@ body:
   attributes:
     label: Checklist
     options:
-    - label: 1. I have searched related issues but cannot get the expected help.
-    - label: 2. The bug has not been fixed in the latest version.
-    - label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
-    - label: 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
-    - label: 5. Please use English, otherwise it will be closed.
+      - label: I searched related issues but found no solution.
+      - label: The bug persists in the latest version.
+      - label: Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
+      - label: If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
+      - label: Please use English. Otherwise, it will be closed.
 - type: textarea
   attributes:
     label: Describe the bug
-    description: A clear and concise description of what the bug is.
+    description: A clear, concise description of the bug.
   validations:
     required: true
 - type: textarea
   attributes:
     label: Reproduction
-    description: |
-      What command or script did you run? Which **model** are you using?
-    placeholder: |
-      A placeholder for the command.
+    description: Command/script run and model used.
+    placeholder: Paste the command here.
   validations:
     required: true
 - type: textarea
   attributes:
     label: Environment
-    description: |
-      Please provide necessary environment information here with `python3 -m sglang.check_env`. Otherwise the issue will be closed.
-    placeholder: Environment here.
+    description: Run `python3 -m sglang.check_env` and paste output here. Issues without this will be closed.
+    placeholder: Paste environment output here.
   validations:
     required: true
@@ -7,17 +7,17 @@ body:
   attributes:
     label: Checklist
     options:
-    - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
-    - label: 2. Please use English, otherwise it will be closed.
+      - label: If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
+      - label: Please use English. Otherwise, it will be closed.
 - type: textarea
   attributes:
     label: Motivation
     description: |
-      A clear and concise description of the motivation of the feature.
+      Clearly and concisely describe the feature's motivation.
   validations:
     required: true
 - type: textarea
   attributes:
     label: Related resources
     description: |
-      If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
+      Provide official releases or third-party implementations if available.
@@ -0,0 +1,40 @@
+# SGLang Code Maintenance Model
+This documentation describes the code maintenance model for the SGLang project.
+Since SGLang is a large project involving multiple organizations and hardware platforms, we designed this model with the following goals:
+- Ensure a responsive and smooth review process.
+- Allow fast iteration, so maintainers can sometimes bypass flaky CI tests for important PRs.
+
+## Role Descriptions
+There are four important roles in this maintenance model. Some are custom roles, while the others are roles predefined by GitHub.
+
+- *Area Maintainer*: The person who drives the PR merge process. They have strong area expertise and hold a high code quality bar.
+  - Permission: Merge PRs. Bypass branch protection rules if needed.
+  - Responsibility: Shepherd the merge of PRs assigned to your area. Revert or hotfix any issues related to your merge (especially if you bypass).
+- *Codeowner*: The person who protects critical code. Without bypass, each PR needs at least one Codeowner approval for each modified file. Please note that this role is not an honor but a great responsibility because PRs cannot be merged without your approval (except when bypassed by an Area Maintainer).
+  - Permission: Approve PRs, allowing them to be merged without bypass.
+  - Responsibility: Review PRs in a timely manner.
+- *Write*: The person with write permission to the sglang repo.
+  - Permission: Merge PRs if they have passed required tests and been approved by Codeowners. This role cannot bypass branch protection rules.
+  - Responsibility: Review and merge PRs in a timely manner.
+- *CI Maintainer*: The person who manages CI runners for specific hardware platforms.
+  - Permission: Add CI runners.
+  - Responsibility: Keep the CI runners up and running.
+
+Note: Difference between Area Maintainer and Codeowner
+- Area Maintainer is an active role who actively try to help merge PRs and can bypass CI if urgent.
+- Codeowner is a passive protection rule provided by GitHub; it prevents accidental changes to critical code.
+- The list of Area Maintainer is attached below. The list of Codeowner is in [CODEOWNERS](./CODEOWNERS) file.
+
+## Pull Request Merge Process
+1. Author submits a pull request (PR) and fills out the PR checklist.
+2. A bot assigns this PR to an Area Maintainer and @-mention them. At the same time, GitHub will auto-request reviews from Codeowners.
+3. The Area Maintainer coordinates the review (asking people to review) and approves the PR; the Codeowner approves the PR.
+4. We can now merge the code:
+   - Ideal case: For each modified file, one Codeowner approves the PR. It also passes the required CI. Then anyone with write permission can merge the code.
+   - In cases where it is difficult to meet all requirements due to flaky CI or slow responses, an Area Maintainer can bypass branch protection to merge the PR.
+
+## The List of Area Maintainers and Reviewers
+TODO
+
+## The List of CI Maintainers
+TODO
@@ -26,7 +26,7 @@ dependencies:
       - '**/requirements*.txt'
       - '**/Cargo.toml'
       - '**/Cargo.lock'
-      - '**/pyproject.toml'
+      - '**/pyproject*.toml'
       - '**/setup.py'
       - '**/poetry.lock'
       - '**/package.json'
@@ -36,31 +36,27 @@ dependencies:
 Multi-modal:
   - changed-files:
     - any-glob-to-any-file:
-      - '**/vision/**/*'
-      - '**/multimodal/**/*'
-      - '**/vlm/**/*'
+      - '**/*multimodal*'
+      - '**/*vision*'
+      - '**/*vlm*'
 
 # LoRA
 lora:
   - changed-files:
     - any-glob-to-any-file:
-      - '**/lora/**/*'
       - '**/*lora*'
 
 # Quantization
 quant:
   - changed-files:
     - any-glob-to-any-file:
-      - '**/quant/**/*'
       - '**/*quant*'
-      - '**/awq/**/*'
-      - '**/gptq/**/*'
+      - '**/*quantization*'
 
 # Speculative decoding
 speculative-decoding:
   - changed-files:
     - any-glob-to-any-file:
-      - '**/speculative/**/*'
       - '**/*speculative*'
 
 # AMD specific
@@ -80,5 +76,4 @@ deepseek:
 hicache:
   - changed-files:
     - any-glob-to-any-file:
-      - '**/hicache/**/*'
       - '**/*hicache*'
@@ -9,6 +9,7 @@ on:
 jobs:
   lint:
     runs-on: ubuntu-latest
+    if: contains(github.event.pull_request.labels.*.name, 'run-ci')
     steps:
       - uses: actions/checkout@v4
 

diff --git a/README.md b/README.md
@@ -55,10 +55,10 @@ It is designed to deliver low-latency and high-throughput inference across a wid
 Its core features include:
 
 - **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
-- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), and reward models (Skywork), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
+- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
 - **Extensive Hardware Support**: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
 - **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, supporting chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
-- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 300,000 GPUs worldwide.
+- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.
 
 ## Getting Started
 - [Install SGLang](https://docs.sglang.ai/get_started/install.html)
@@ -68,13 +68,11 @@ Its core features include:
 - [Contribution Guide](https://docs.sglang.ai/developer_guide/contribution_guide.html)
 
 ## Benchmark and Performance
-Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/).
-
-## Roadmap
-[Development Roadmap (2025 Q4)](https://github.com/sgl-project/sglang/issues/12780)
+Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/), [GB200 rack-scale parallelism](https://lmsys.org/blog/2025-09-25-gb200-part-2/).
 
 ## Adoption and Sponsorship
-SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 300,000 GPUs worldwide.
+SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia.
+As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 400,000 GPUs worldwide.
 SGLang is currently hosted under the non-profit open-source organization [LMSYS](https://lmsys.org/about/).
 
 <img src="https://raw.githubusercontent.com/sgl-project/sgl-learning-materials/refs/heads/main/slides/adoption.png" alt="logo" width="800" margin="10px"></img>

diff --git a/docs/advanced_features/hicache.rst b/docs/advanced_features/hicache.rst
@@ -1,5 +1,5 @@
 Hierarchical KV Caching (HiCache)
-======================
+=================================
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/advanced_features/observability.md b/docs/advanced_features/observability.md
@@ -7,7 +7,7 @@ You can query them by:
 curl http://localhost:30000/metrics
 ```
 
-See [Production Metrics](../references/production_metrics.md) for more details.
+See [Production Metrics](../references/production_metrics.md) and [Production Request Tracing](../references/production_request_trace.md) for more details.
 
 ## Logging
 

diff --git a/docs/advanced_features/pd_multiplexing.md b/docs/advanced_features/pd_multiplexing.md
diff --git a/docs/basic_usage/deepseek.md → docs/basic_usage/deepseek_v3.md b/docs/basic_usage/deepseek.md → docs/basic_usage/deepseek_v3.md
@@ -1,4 +1,4 @@
-# DeepSeek Usage
+# DeepSeek V3/V3.1/R1 Usage
 
 SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.
 

diff --git a/docs/basic_usage/popular_model_usage.rst b/docs/basic_usage/popular_model_usage.rst
@@ -0,0 +1,12 @@
+Popular Model Usage (DeepSeeek, GPT-OSS, Llama, Qwen, and more)
+===============================================================
+
+.. toctree::
+   :maxdepth: 1
+
+   deepseek_v3.md
+   deepseek_v32.md
+   gpt_oss.md
+   llama4.md
+   qwen3.md
+   qwen3_vl.md
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,6 +9,7 @@ on: @@
     jobs:
       lint:
         runs-on: ubuntu-latest
+        if: contains(github.event.pull_request.labels.*.name, 'run-ci')
         steps:
           - uses: actions/checkout@v4
@@ Expand Down @@