Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
5de6e8c to
aeecb6b
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> update pr number fix max running request
aeecb6b to
83fafdd
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude add PR description |
|
@jgangani @functionstackx can you guys please review and approve the PR? |
@Ankur-singh Looks good to me. Would be good to revisit later EAGLE draft tokens < 4 for higher concurrencies. |
Summary
Adds a new benchmark configuration for Qwen3.5-397B-A17B FP8 on B200 using SGLang with MTP (Multi-Token Prediction) via EAGLE speculative decoding.
Changes
New benchmark script:
benchmarks/single_node/qwen3.5_fp8_b200_mtp.shnum-steps=3,draft-tokens=4,topk=1fp8-gemm-backend=flashinfer_trtllm,attention-backend=trtllm_mha,moe-runner-backend=flashinfer_trtllm)Config entry:
qwen3.5-fp8-b200-sglang-mtpinnvidia-master.yamllmsysorg/sglang:v0.5.9-cu130Qwen/Qwen3.5-397B-A17B-FP8Changelog entry in
perf-changelog.yamlTest Plan
qwen3.5-fp8-b200-sglang-mtpto validate server startup and benchmark completionqwen3.5-fp8-b200-sglangbaseline