Skip to content

test#917

Open
functionstackx wants to merge 2 commits intomainfrom
functionstackx-patch-1
Open

test#917
functionstackx wants to merge 2 commits intomainfrom
functionstackx-patch-1

Conversation

@functionstackx
Copy link
Contributor

No description provided.

functionstackx and others added 2 commits March 18, 2026 04:20
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Comment on lines +964 to +970

- config-keys:
- dsr1-fp4-mi355x-sglang-disagg-mtp
- dsr1-fp8-mi355x-sglang-disagg-mtp
description:
- "test"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog entry has placeholder values: pr-link is pull/XXX (should be pull/917), description is just "test" instead of describing the actual change (adding --speculative-num-draft-tokens 3 to MTP flags), and the entry overall is not ready for merge. Please update the PR link, write a meaningful description, and verify the config-keys cover all affected benchmarks.

Extended reasoning...

What the bug is

The perf-changelog entry added at lines 964-970 contains three placeholder values that are clearly not production-ready:

  1. pr-link: pull/XXX — This is a placeholder URL. Every changelog entry needs a working link to its PR. The correct value should be pull/917.
  2. description: "test" — Every other entry in perf-changelog.yaml has a meaningful description of what changed. This should describe the actual change: adding --speculative-num-draft-tokens 3 to MTP flags for DeepSeek models in the AMD multi-node disagg config.
  3. config-keys — Lists only dsr1-fp4-mi355x-sglang-disagg-mtp and dsr1-fp8-mi355x-sglang-disagg-mtp. The models.yaml diff modifies mtp_flags for all 6 DeepSeek model entries (V3, V3-0324, R1, R1-0528, R1-0528-MXFP4-Preview, R1-0528-MXFP4).

Config-keys scope analysis

One counterargument is that mtp_flags are only applied when DECODE_MTP_SIZE > 0 (i.e., only MTP-enabled benchmark configs consume them), so non-MTP disagg configs are unaffected. This is a valid point — the two listed config-keys may indeed be the only currently existing benchmark configurations that use these flags. If no other disagg-mtp benchmark configs exist for DeepSeek V3/V3-0324, then the config-keys list could be correct. The author should verify whether any other MTP-enabled benchmarks reference these model entries.

Step-by-step proof

  1. Open perf-changelog.yaml and look at line 970: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.
  2. This is PR test #917, so the link should end in /pull/917.
  3. Look at line 968: description: - "test". Compare with any other entry, e.g., the entry for PR H200 MINIMAX vLLM extend configs  #869 which reads "Extend MiniMax M2.5 FP8 single-node config for H200 with vLLM v0.16.0 (TP8)". The description "test" provides zero information.
  4. The models.yaml diff shows --speculative-num-draft-tokens 3 added to all 6 DeepSeek model mtp_flags entries, but the changelog only references 2 disagg-mtp config-keys.

Impact

If merged as-is, the changelog would contain a broken PR link (pull/XXX leads to a 404), a meaningless description ("test"), and potentially incomplete config-key coverage. Anyone reviewing the performance changelog would not understand what changed or be able to trace it back to the PR.

How to fix

  • Change pr-link to https://github.com/SemiAnalysisAI/InferenceX/pull/917.
  • Replace the description with something like: "Add --speculative-num-draft-tokens 3 to MTP flags for all DeepSeek models in AMD multi-node disagg config".
  • Verify config-keys: confirm whether the two listed disagg-mtp configs are the only MTP-enabled benchmarks consuming these flags, and add any missing ones.

Note: there are 5 pre-existing pull/XXX entries in the file (e.g., H200 SGLang v0.5.9 entry, GLM-5, Qwen3.5 entries), so this is an established pattern for WIP entries. The PR title being "test" suggests this may be a draft, but these placeholders should be resolved before merge.

DeepSeek-V3:
base_flags: "--decode-log-interval 1000 --log-level warning --watchdog-timeout 3600 --ep-dispatch-algorithm fake --load-balance-method round_robin --kv-cache-dtype fp8_e4m3 --attention-backend aiter --disaggregation-transfer-backend mori"
mtp_flags: "--speculative-algorithm NEXTN --speculative-eagle-topk 1"
mtp_flags: "--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-draft-tokens 3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Adding --speculative-num-draft-tokens 3 to mtp_flags here is a no-op because server.sh:289 already appends --speculative-num-draft-tokens $((decode_mtp_size + 1)) after expanding MODEL_MTP_FLAGS. Since SGLang uses argparse (last value wins), this YAML value is always silently overridden. Remove the flag from either the YAML or the dynamic computation in server.sh to avoid the duplication.

Extended reasoning...

What the bug is

This PR adds --speculative-num-draft-tokens 3 to the mtp_flags field for all 6 DeepSeek model configs in models.yaml. However, the consumer of this field — build_server_config() in server.sh — already appends its own --speculative-num-draft-tokens dynamically, causing the flag to appear twice on the final command line.

The specific code path

At server.sh:289, when MTP is enabled (decode_mtp_size > 0), the function constructs:

mtp_config="${MODEL_MTP_FLAGS} --speculative-num-steps ${decode_mtp_size} --speculative-num-draft-tokens $((decode_mtp_size + 1))"

MODEL_MTP_FLAGS is populated from the YAML mtp_flags field, which after this PR expands to:
--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-draft-tokens 3

So the resulting mtp_config string becomes:
--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 --speculative-num-steps 2 --speculative-num-draft-tokens 3

The flag --speculative-num-draft-tokens appears at positions from both YAML and the dynamic computation.

Step-by-step proof with a concrete example

  1. User sets DECODE_MTP_SIZE=2 (the common case).
  2. MODEL_MTP_FLAGS = "--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-draft-tokens 3" (from YAML).
  3. server.sh:289 computes $((decode_mtp_size + 1)) = 3.
  4. Final mtp_config = "--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 --speculative-num-steps 2 --speculative-num-draft-tokens 3".
  5. Both values happen to be 3, so no functional difference — but the duplication is still present.

Now consider DECODE_MTP_SIZE=1:

  1. MODEL_MTP_FLAGS still contains --speculative-num-draft-tokens 3 (hardcoded in YAML).
  2. server.sh:289 computes $((1 + 1)) = 2.
  3. Final command line: --speculative-num-draft-tokens 3 ... --speculative-num-draft-tokens 2.
  4. SGLang’s argparse sees both, last value wins → uses 2, silently ignoring the YAML’s 3.

Why existing code doesn’t prevent it

There is no deduplication logic in server.sh. The script simply concatenates MODEL_MTP_FLAGS with the dynamically computed flags. The YAML schema documentation does not warn against including flags that server.sh already manages.

Impact

The YAML change is effectively dead code for the server.sh code path (which is the only consumer of mtp_flags — confirmed via grep). For DECODE_MTP_SIZE=2 there is no behavioral difference, but the duplication is misleading: it gives the false impression that the YAML controls this parameter, when in reality server.sh always overrides it. For any other MTP size, the values would silently conflict.

How to fix

Remove --speculative-num-draft-tokens 3 from the mtp_flags in models.yaml (preferred, since the dynamic computation in server.sh correctly derives it from decode_mtp_size). Alternatively, remove the dynamic --speculative-num-draft-tokens computation from server.sh:289 if the intent is to let each model config control the value.

@functionstackx
Copy link
Contributor Author

@billishyahao counterintuitively, although theoretical modelling shows that AMD's MTP=2/3 should performance between than AMD's MTP=1, in the current stack, AMD's MTP=3 performs worse than MTP=1. But since ur boss is saying AMD should go back to MTP=2/3, we can change it to that...

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant