Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 16 additions & 18 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,28 +1,26 @@
.github @merrymercy @Fridge003 @ispobock
/docker @Fridge003 @ispobock @HaiShaw @ByronHsu
/docker/npu.Dockerfile @ping1jing2
.github @merrymercy @Fridge003 @ispobock @Kangyan-Zhou
/docker @Fridge003 @ispobock @HaiShaw @ishandhanani
/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
/python/sglang/* @merrymercy @Ying1123 @Fridge003 @ispobock @hnyls2002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Removing the wildcard rule /python/sglang/* could be risky. While there are many specific rules for subdirectories, any new file or directory created directly under /python/sglang/ that doesn't match an existing pattern will now have no code owner. This could lead to unreviewed changes slipping through. Consider re-adding a general rule for /python/sglang/* with a default set of owners to act as a fallback.

/python/sglang/multimodal_gen @mickqian
/python/sglang/srt/constrained @hnyls2002
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002
/python/sglang/srt/disaggregation/mooncake @ShangmingCai
/python/sglang/srt/disaggregation/ascend @ping1jing2
/python/sglang/srt/distributed @yizhang2077 @merrymercy
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
/python/sglang/srt/eplb @fzyzcjy
/python/sglang/srt/eplb @fzyzcjy @ch-wan
/python/sglang/srt/function_call @CatherineSue @JustinTong0323
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg
/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2
/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ
/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang
/python/sglang/srt/managers @merrymercy @Ying1123 @zhyncs @hnyls2002 @xiezhq-hermann
/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @zhyncs
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2
/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2
/python/sglang/srt/multimodal @mickqian @JustinTong0323
/python/sglang/srt/speculative @Ying1123 @merrymercy @kssteven418
/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2 @iforgetmyname
/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201
/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
/sgl-kernel @zhyncs @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
/sgl-router @slin1237 @ByronHsu @CatherineSue
/sgl-router/benches @slin1237
Expand All @@ -40,5 +38,5 @@
/sgl-router/src/routers @CatherineSue @key4ng @slin1237
/sgl-router/src/tokenizer @slin1237 @CatherineSue
/sgl-router/src/tool_parser @slin1237 @CatherineSue
/test/srt/ascend @ping1jing2 @iforgetmyname
/test/srt/test_modelopt* @Edwardf0t1
/test/srt/ascend @ping1jing2
25 changes: 11 additions & 14 deletions .github/ISSUE_TEMPLATE/1-bug-report.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug
description: Report a bug to help us reproduce and fix it.
title: "[Bug] "
labels: ['Bug']

Expand All @@ -8,31 +8,28 @@ body:
attributes:
label: Checklist
options:
- label: 1. I have searched related issues but cannot get the expected help.
- label: 2. The bug has not been fixed in the latest version.
- label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- label: 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 5. Please use English, otherwise it will be closed.
- label: I searched related issues but found no solution.
- label: The bug persists in the latest version.
- label: Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- label: If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Describe the bug
description: A clear and concise description of what the bug is.
description: A clear, concise description of the bug.
validations:
required: true
- type: textarea
attributes:
label: Reproduction
description: |
What command or script did you run? Which **model** are you using?
placeholder: |
A placeholder for the command.
description: Command/script run and model used.
placeholder: Paste the command here.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
Please provide necessary environment information here with `python3 -m sglang.check_env`. Otherwise the issue will be closed.
placeholder: Environment here.
description: Run `python3 -m sglang.check_env` and paste output here. Issues without this will be closed.
placeholder: Paste environment output here.
validations:
required: true
8 changes: 4 additions & 4 deletions .github/ISSUE_TEMPLATE/2-feature-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ body:
attributes:
label: Checklist
options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 2. Please use English, otherwise it will be closed.
- label: If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Motivation
description: |
A clear and concise description of the motivation of the feature.
Clearly and concisely describe the feature's motivation.
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
Provide official releases or third-party implementations if available.
40 changes: 40 additions & 0 deletions .github/MAINTAINER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SGLang Code Maintenance Model
This documentation describes the code maintenance model for the SGLang project.
Since SGLang is a large project involving multiple organizations and hardware platforms, we designed this model with the following goals:
- Ensure a responsive and smooth review process.
- Allow fast iteration, so maintainers can sometimes bypass flaky CI tests for important PRs.

## Role Descriptions
There are four important roles in this maintenance model. Some are custom roles, while the others are roles predefined by GitHub.

- *Area Maintainer*: The person who drives the PR merge process. They have strong area expertise and hold a high code quality bar.
- Permission: Merge PRs. Bypass branch protection rules if needed.
- Responsibility: Shepherd the merge of PRs assigned to your area. Revert or hotfix any issues related to your merge (especially if you bypass).
- *Codeowner*: The person who protects critical code. Without bypass, each PR needs at least one Codeowner approval for each modified file. Please note that this role is not an honor but a great responsibility because PRs cannot be merged without your approval (except when bypassed by an Area Maintainer).
- Permission: Approve PRs, allowing them to be merged without bypass.
- Responsibility: Review PRs in a timely manner.
- *Write*: The person with write permission to the sglang repo.
- Permission: Merge PRs if they have passed required tests and been approved by Codeowners. This role cannot bypass branch protection rules.
- Responsibility: Review and merge PRs in a timely manner.
- *CI Maintainer*: The person who manages CI runners for specific hardware platforms.
- Permission: Add CI runners.
- Responsibility: Keep the CI runners up and running.

Note: Difference between Area Maintainer and Codeowner
- Area Maintainer is an active role who actively try to help merge PRs and can bypass CI if urgent.
- Codeowner is a passive protection rule provided by GitHub; it prevents accidental changes to critical code.
- The list of Area Maintainer is attached below. The list of Codeowner is in [CODEOWNERS](./CODEOWNERS) file.

## Pull Request Merge Process
1. Author submits a pull request (PR) and fills out the PR checklist.
2. A bot assigns this PR to an Area Maintainer and @-mention them. At the same time, GitHub will auto-request reviews from Codeowners.
3. The Area Maintainer coordinates the review (asking people to review) and approves the PR; the Codeowner approves the PR.
4. We can now merge the code:
- Ideal case: For each modified file, one Codeowner approves the PR. It also passes the required CI. Then anyone with write permission can merge the code.
- In cases where it is difficult to meet all requirements due to flaky CI or slow responses, an Area Maintainer can bypass branch protection to merge the PR.

## The List of Area Maintainers and Reviewers
TODO

## The List of CI Maintainers
TODO
54 changes: 0 additions & 54 deletions .github/REVIEWERS.md

This file was deleted.

15 changes: 5 additions & 10 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies:
- '**/requirements*.txt'
- '**/Cargo.toml'
- '**/Cargo.lock'
- '**/pyproject.toml'
- '**/pyproject*.toml'
- '**/setup.py'
- '**/poetry.lock'
- '**/package.json'
Expand All @@ -36,31 +36,27 @@ dependencies:
Multi-modal:
- changed-files:
- any-glob-to-any-file:
- '**/vision/**/*'
- '**/multimodal/**/*'
- '**/vlm/**/*'
- '**/*multimodal*'
- '**/*vision*'
- '**/*vlm*'

# LoRA
lora:
- changed-files:
- any-glob-to-any-file:
- '**/lora/**/*'
- '**/*lora*'

# Quantization
quant:
- changed-files:
- any-glob-to-any-file:
- '**/quant/**/*'
- '**/*quant*'
- '**/awq/**/*'
- '**/gptq/**/*'
- '**/*quantization*'

# Speculative decoding
speculative-decoding:
- changed-files:
- any-glob-to-any-file:
- '**/speculative/**/*'
- '**/*speculative*'

# AMD specific
Expand All @@ -80,5 +76,4 @@ deepseek:
hicache:
- changed-files:
- any-glob-to-any-file:
- '**/hicache/**/*'
- '**/*hicache*'
1 change: 1 addition & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ on:
jobs:
lint:
runs-on: ubuntu-latest
if: contains(github.event.pull_request.labels.*.name, 'run-ci')
steps:
- uses: actions/checkout@v4

Expand Down
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ It is designed to deliver low-latency and high-throughput inference across a wid
Its core features include:

- **Fast Backend Runtime**: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), and reward models (Skywork), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
- **Extensive Model Support**: Supports a wide range of generative models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for integrating new models. Compatible with most Hugging Face models and OpenAI APIs.
- **Extensive Hardware Support**: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
- **Flexible Frontend Language**: Offers an intuitive interface for programming LLM applications, supporting chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 300,000 GPUs worldwide.
- **Active Community**: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.

## Getting Started
- [Install SGLang](https://docs.sglang.ai/get_started/install.html)
Expand All @@ -68,13 +68,11 @@ Its core features include:
- [Contribution Guide](https://docs.sglang.ai/developer_guide/contribution_guide.html)

## Benchmark and Performance
Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/).

## Roadmap
[Development Roadmap (2025 Q4)](https://github.com/sgl-project/sglang/issues/12780)
Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/), [Large-scale expert parallelism](https://lmsys.org/blog/2025-05-05-large-scale-ep/), [GB200 rack-scale parallelism](https://lmsys.org/blog/2025-09-25-gb200-part-2/).

## Adoption and Sponsorship
SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 300,000 GPUs worldwide.
SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations across North America and Asia.
As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 400,000 GPUs worldwide.
SGLang is currently hosted under the non-profit open-source organization [LMSYS](https://lmsys.org/about/).

<img src="https://raw.githubusercontent.com/sgl-project/sgl-learning-materials/refs/heads/main/slides/adoption.png" alt="logo" width="800" margin="10px"></img>
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced_features/hicache.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Hierarchical KV Caching (HiCache)
======================
=================================

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced_features/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ You can query them by:
curl http://localhost:30000/metrics
```

See [Production Metrics](../references/production_metrics.md) for more details.
See [Production Metrics](../references/production_metrics.md) and [Production Request Tracing](../references/production_request_trace.md) for more details.

## Logging

Expand Down
56 changes: 0 additions & 56 deletions docs/advanced_features/pd_multiplexing.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# DeepSeek Usage
# DeepSeek V3/V3.1/R1 Usage

SGLang provides many optimizations specifically designed for the DeepSeek models, making it the inference engine recommended by the official [DeepSeek team](https://github.com/deepseek-ai/DeepSeek-V3/tree/main?tab=readme-ov-file#62-inference-with-sglang-recommended) from Day 0.

Expand Down
12 changes: 12 additions & 0 deletions docs/basic_usage/popular_model_usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Popular Model Usage (DeepSeeek, GPT-OSS, Llama, Qwen, and more)
===============================================================

.. toctree::
:maxdepth: 1

deepseek_v3.md
deepseek_v32.md
gpt_oss.md
llama4.md
qwen3.md
qwen3_vl.md
Loading
Loading