Skip to content

Conversation

@JadoTu
Copy link
Collaborator

@JadoTu JadoTu commented Nov 20, 2025

Summary by CodeRabbit

  • Tests
    • Added comprehensive test coverage for the new Qwen3-Next-80B-A3B-Instruct model variant with diverse execution configurations
    • Validates both BF16 floating-point and NVFP4 quantized execution modes with established performance baselines
    • Enables benchmarking on standard evaluation tasks with complete support for distributed inference optimization through tensor, pipeline, and expert parallelism strategies

✏️ Tip: You can customize this high-level summary in your review settings.

Description

add qwen3-next CI test of accuracy on BF16 and NVFP4

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

📝 Walkthrough

Walkthrough

This PR adds test coverage and reference accuracy data for a new Qwen3-Next-80B-A3B-Instruct model variant, including benchmark entries for GSM8K and MMLU evaluation tasks, new test harness configurations, and updated integration test lists.

Changes

Cohort / File(s) Summary
Reference accuracy data
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml
Added Qwen3/Qwen3-Next-80B-A3B-Instruct model entries with accuracy metrics (GSM8K: 84.42/84.32; MMLU: 85.58/85) and quantization configurations (NVFP4 with FP8 kv_cache).
Test harness
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Added TestQwen3NextInstruct test class with bf16 and NVFP4 test methods, including parameterized configurations for tensor parallel, pipeline parallel, MoE expert parallel, CUDA graph, and overlap scheduler settings. Note: Duplicate class definition present in diff.
Integration test configuration
tests/integration/test_lists/test-db/l0_dgx_b200.yml
Replaced single GPTOSS test entry with six Qwen3NextInstruct test variants covering bf16 and multiple NVFP4 configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Duplicate class definition in test_llm_api_pytorch.py: The TestQwen3NextInstruct class appears twice with identical signatures and methods; verify this is intentional or a merge artifact.
  • Test parameterization logic: Review the parameter combinations in test_bf16_4gpu and test_nvfp4 methods for correctness and coverage alignment.
  • Reference accuracy values: Validate that accuracy metrics in YAML files match expected benchmarks for the quantized and non-quantized variants.
  • Test list migration: Confirm that replacing GPTOSS test with Qwen3NextInstruct variants maintains desired test coverage for the integration suite.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and does not follow the provided template. It lacks required sections such as test coverage details, PR checklist confirmation, and a thorough explanation of what and why. Add a complete description explaining the changes and rationale, list specific test coverage (e.g., test_bf16_4gpu and test_nvfp4 methods), and confirm all PR checklist items have been reviewed.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding CI tests for the Qwen3-Next model with BF16 and NVFP4 accuracy testing, which aligns with all file changes in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bdd3ba and 3743cf4.

📒 Files selected for processing (4)
  • tests/integration/defs/accuracy/references/gsm8k.yaml (1 hunks)
  • tests/integration/defs/accuracy/references/mmlu.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-26T09:49:04.956Z
Learnt from: pengbowang-nv
Repo: NVIDIA/TensorRT-LLM PR: 7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-09-17T02:48:52.732Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7781
File: tests/integration/test_lists/waives.txt:313-313
Timestamp: 2025-09-17T02:48:52.732Z
Learning: In TensorRT-LLM, `tests/integration/test_lists/waives.txt` is specifically for waiving/skipping tests, while other test list files like those in `test-db/` and `qa/` directories are for different test execution contexts (pre-merge, post-merge, QA tests). The same test appearing in both waives.txt and execution list files is intentional - the test is part of test suites but will be skipped due to the waiver.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.

Applied to files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧬 Code graph analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (4)
tests/integration/defs/accuracy/accuracy_core.py (5)
  • LlmapiAccuracyTestHarness (949-960)
  • MMLU (317-331)
  • evaluate (184-247)
  • evaluate (868-878)
  • GSM8K (334-349)
tests/integration/defs/conftest.py (1)
  • llm_models_root (80-94)
tensorrt_llm/llmapi/llm_args.py (3)
  • KvCacheConfig (1426-1570)
  • CudaGraphConfig (102-159)
  • MoeConfig (373-407)
tensorrt_llm/llmapi/llm.py (1)
  • LLM (1101-1117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

Comment on lines 4139 to 4166
model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct"
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6,
enable_block_reuse=False)
pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler,
cuda_graph_config=CudaGraphConfig(
max_batch_size=512) if cuda_graph else None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't overwrite the local model path with the HuggingFace repo.

You first resolve the pre-downloaded model under llm_models_root(), but then immediately replace it with "Qwen/Qwen3-Next-80B-A3B-Instruct". Our CI runs in an offline environment, so this reassignment forces a network fetch and the test will fail. Lean on the cached artifact and drop the second assignment.

Apply this diff:

-        model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
-        model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct"
+        model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct"
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6,
enable_block_reuse=False)
pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler,
cuda_graph_config=CudaGraphConfig(
max_batch_size=512) if cuda_graph else None)
model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6,
enable_block_reuse=False)
pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler,
cuda_graph_config=CudaGraphConfig(
max_batch_size=512) if cuda_graph else None)
🤖 Prompt for AI Agents
In tests/integration/defs/accuracy/test_llm_api_pytorch.py around lines 4139 to
4146, the code resolves a pre-downloaded local model path then immediately
overwrites it with the HuggingFace repo string, which forces a network fetch in
offline CI; remove the second assignment that sets model_path =
"Qwen/Qwen3-Next-80B-A3B-Instruct" so the test uses the cached local model path
returned by llm_models_root(), leaving the initial model_path resolution and
subsequent configs unchanged.

Comment on lines 4171 to 4204
model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"
model_path = "/home/scratch.didow_sw_1/models/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"

kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6,
enable_block_reuse=False)
pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler,
cuda_graph_config=CudaGraphConfig(
max_batch_size=512) if cuda_graph else None)
moe_config = MoeConfig(backend=moe_backend)

with LLM(model_path,
tensor_parallel_size=tp_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove the developer-specific absolute path.

Hard-coding /home/scratch.didow_sw_1/... makes this test unusable anywhere outside your workstation and will break CI immediately. Stick to the artifact rooted under llm_models_root() (and optionally fall back if it’s genuinely missing), but never point to a private filesystem path.

Apply this diff:

-        model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"
-        model_path = "/home/scratch.didow_sw_1/models/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"
+        model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 23, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25452 [ run ] triggered by Bot. Commit: ed8c171

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25452 [ run ] completed with state FAILURE. Commit: ed8c171
/LLM/main/L0_MergeRequest_PR pipeline #19266 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 23, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25459 [ run ] triggered by Bot. Commit: 06f9bb3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25459 [ run ] completed with state SUCCESS. Commit: 06f9bb3
/LLM/main/L0_MergeRequest_PR pipeline #19273 completed with status: 'FAILURE'

@nv-guomingz
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25466 [ run ] triggered by Bot. Commit: 06f9bb3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25466 [ run ] completed with state SUCCESS. Commit: 06f9bb3
/LLM/main/L0_MergeRequest_PR pipeline #19280 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25480 [ run ] triggered by Bot. Commit: 8a69425

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25480 [ run ] completed with state SUCCESS. Commit: 8a69425
/LLM/main/L0_MergeRequest_PR pipeline #19294 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25493 [ run ] triggered by Bot. Commit: 8a69425

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25493 [ run ] completed with state SUCCESS. Commit: 8a69425
/LLM/main/L0_MergeRequest_PR pipeline #19304 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run
🙏

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25511 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: 🙏

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25521 [ run ] triggered by Bot. Commit: d57c919

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25521 [ run ] completed with state SUCCESS. Commit: d57c919
/LLM/main/L0_MergeRequest_PR pipeline #19328 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25542 [ run ] triggered by Bot. Commit: f8315ee

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25542 [ run ] completed with state SUCCESS. Commit: f8315ee
/LLM/main/L0_MergeRequest_PR pipeline #19341 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25565 [ run ] triggered by Bot. Commit: f8315ee

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25565 [ run ] completed with state SUCCESS. Commit: f8315ee
/LLM/main/L0_MergeRequest_PR pipeline #19362 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25593 [ run ] triggered by Bot. Commit: f8315ee

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25593 [ run ] completed with state SUCCESS. Commit: f8315ee
/LLM/main/L0_MergeRequest_PR pipeline #19387 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 25, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25623 [ run ] triggered by Bot. Commit: 9e67329

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25622 [ run ] triggered by Bot. Commit: 9e67329

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25622 [ run ] completed with state ABORTED. Commit: 9e67329

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25623 [ run ] completed with state SUCCESS. Commit: 9e67329
/LLM/main/L0_MergeRequest_PR pipeline #19412 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 25, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25657 [ run ] triggered by Bot. Commit: c467fe6

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25657 [ run ] completed with state FAILURE. Commit: c467fe6
LLM/main/L0_MergeRequest_PR #19444 (Blue Ocean) completed with status: ABORTED

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 25, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25709 [ run ] triggered by Bot. Commit: fce137e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25709 [ run ] completed with state SUCCESS. Commit: fce137e
/LLM/main/L0_MergeRequest_PR pipeline #19491 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 25, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25749 [ run ] triggered by Bot. Commit: fce137e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25749 [ run ] completed with state FAILURE. Commit: fce137e
/LLM/main/L0_MergeRequest_PR pipeline #19525 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 26, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25838 [ run ] triggered by Bot. Commit: fce137e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25838 [ run ] completed with state FAILURE. Commit: fce137e
/LLM/main/L0_MergeRequest_PR pipeline #19596 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 26, 2025

/bot run

1 similar comment
@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 26, 2025

/bot run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants