Add Qwen3.5 FP8 B200 SGLang MTP config by ankursingh-nv · Pull Request #898 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2026-03-09T22:06:18Z

Summary

Adds a new benchmark configuration for Qwen3.5-397B-A17B FP8 on B200 using SGLang with MTP (Multi-Token Prediction) via EAGLE speculative decoding.

Changes

New benchmark script: `benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh`

SGLang launch with FP8 quantization and FP8 E4M3 KV cache
EAGLE speculative decoding config: num-steps=3, draft-tokens=4, topk=1
FlashInfer + TRT-LLM backends (fp8-gemm-backend=flashinfer_trtllm, attention-backend=trtllm_mha, moe-runner-backend=flashinfer_trtllm)
FlashInfer allreduce fusion enabled
Adaptive scheduler recv interval (10 for low concurrency, 30 for CONC >= 16)
Radix cache disabled, context length dynamically set from ISL + OSL

Config entry: `qwen3.5-fp8-b200-sglang-mtp` in `nvidia-master.yaml`

Image: lmsysorg/sglang:v0.5.9-cu130
Model: Qwen/Qwen3.5-397B-A17B-FP8
Single-node, TP=4, EP=1
Concurrency sweep: 4–256 across all three sequence-length configs (1k/1k, 1k/8k, 8k/1k)

Changelog entry in `perf-changelog.yaml`

Documents the new config addition with spec-decoding details

Test Plan

Run e2e benchmarks for qwen3.5-fp8-b200-sglang-mtp to validate server startup and benchmark completion
Verify MTP speculative decoding produces correct results across sequence lengths
Compare throughput/latency against the non-MTP qwen3.5-fp8-b200-sglang baseline

github-actions · 2026-03-09T22:06:27Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-09T22:06:27Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

functionstackx

.

benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh

perf-changelog.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> update pr number fix max running request

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cquil11 · 2026-03-18T18:26:09Z

@claude add PR description

Klaud-Cold · 2026-03-18T18:27:15Z

Claude finished @cquil11's task in 49s —— View job

Adding PR Description

Read changed files to understand the PR
Write and update PR description

PR description has been added. It covers the new benchmark script, the nvidia-master.yaml config entry, and the changelog update.

ankursingh-nv · 2026-03-19T16:07:17Z

@jgangani @functionstackx can you guys please review and approve the PR?

jgangani · 2026-03-19T18:20:12Z

@jgangani @functionstackx can you guys please review and approve the PR?

@Ankur-singh Looks good to me. Would be good to revisit later EAGLE draft tokens < 4 for higher concurrencies.

ankursingh-nv requested a review from a team March 9, 2026 22:06

ankursingh-nv requested review from jgangani and kedarpotdar-nv as code owners March 9, 2026 22:06

github-project-automation bot added this to InferenceMAX Board Mar 9, 2026

ankursingh-nv added the sweep-enabled label Mar 9, 2026

functionstackx requested changes Mar 9, 2026

View reviewed changes

claude bot reviewed Mar 9, 2026

View reviewed changes

benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh Show resolved Hide resolved

perf-changelog.yaml Outdated Show resolved Hide resolved

ankursingh-nv force-pushed the qwen-sglang-b200-mtp-fp8 branch 2 times, most recently from 5de6e8c to aeecb6b Compare March 10, 2026 20:06

Add Qwen3.5 FP8 B200 SGLang MTP config

956abf4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> update pr number fix max running request

ankursingh-nv removed the sweep-enabled label Mar 11, 2026

update config

83fafdd

ankursingh-nv force-pushed the qwen-sglang-b200-mtp-fp8 branch from aeecb6b to 83fafdd Compare March 11, 2026 19:26

functionstackx and others added 3 commits March 11, 2026 16:40

Add GPU monitoring to qwen3.5_fp8_b200_mtp benchmark script

4373c3b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

updated deprecated env and mem fraction

72032e0

Merge branch 'main' into qwen-sglang-b200-mtp-fp8

1c3aa8b

ankursingh-nv added the sweep-enabled label Mar 12, 2026

Ankur-singh added 2 commits March 17, 2026 09:36

Merge branch 'main' into qwen-sglang-b200-mtp-fp8

c444170

Update perf-changelog.yaml

bf6e24f

ankursingh-nv requested a review from functionstackx March 17, 2026 16:39

cquil11 approved these changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into qwen-sglang-b200-mtp-fp8

148120d

ankursingh-nv enabled auto-merge (squash) March 18, 2026 23:59

ankursingh-nv changed the title ~~[WIP] Add Qwen3.5 FP8 B200 SGLang MTP config~~ Add Qwen3.5 FP8 B200 SGLang MTP config Mar 19, 2026

kedarpotdar-nv approved these changes Mar 19, 2026

View reviewed changes

Merge branch 'main' into qwen-sglang-b200-mtp-fp8

bc288cf

Update perf-changelog.yaml

b284f6a

ankursingh-nv disabled auto-merge March 19, 2026 16:07

jgangani approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 FP8 B200 SGLang MTP config#898

Add Qwen3.5 FP8 B200 SGLang MTP config#898
ankursingh-nv wants to merge 10 commits intomainfrom
qwen-sglang-b200-mtp-fp8

ankursingh-nv commented Mar 9, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Uh oh!

cquil11 commented Mar 18, 2026

Uh oh!

Klaud-Cold commented Mar 18, 2026 •

edited

Loading

Uh oh!

ankursingh-nv commented Mar 19, 2026

Uh oh!

jgangani commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ankursingh-nv commented Mar 9, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New benchmark script: benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh

Config entry: qwen3.5-fp8-b200-sglang-mtp in nvidia-master.yaml

Changelog entry in perf-changelog.yaml

Test Plan

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cquil11 commented Mar 18, 2026

Uh oh!

Klaud-Cold commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

ankursingh-nv commented Mar 19, 2026

Uh oh!

jgangani commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ankursingh-nv commented Mar 9, 2026 •

edited by Klaud-Cold

Loading

New benchmark script: `benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh`

Config entry: `qwen3.5-fp8-b200-sglang-mtp` in `nvidia-master.yaml`

Changelog entry in `perf-changelog.yaml`

Klaud-Cold commented Mar 18, 2026 •

edited

Loading