-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 #9330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR adds test coverage and reference accuracy data for a new Qwen3-Next-80B-A3B-Instruct model variant, including benchmark entries for GSM8K and MMLU evaluation tasks, new test harness configurations, and updated integration test lists. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/integration/defs/accuracy/references/gsm8k.yaml(1 hunks)tests/integration/defs/accuracy/references/mmlu.yaml(1 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py(1 hunks)tests/integration/test_lists/test-db/l0_dgx_b200.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-26T09:49:04.956Z
Learnt from: pengbowang-nv
Repo: NVIDIA/TensorRT-LLM PR: 7192
File: tests/integration/test_lists/test-db/l0_dgx_b200.yml:56-72
Timestamp: 2025-08-26T09:49:04.956Z
Learning: In TensorRT-LLM test configuration files, the test scheduling system handles wildcard matching with special rules that prevent duplicate test execution even when the same tests appear in multiple yaml files with overlapping GPU wildcards (e.g., "*b200*" and "*gb200*").
Applied to files:
tests/integration/test_lists/test-db/l0_dgx_b200.ymltests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-09-17T02:48:52.732Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7781
File: tests/integration/test_lists/waives.txt:313-313
Timestamp: 2025-09-17T02:48:52.732Z
Learning: In TensorRT-LLM, `tests/integration/test_lists/waives.txt` is specifically for waiving/skipping tests, while other test list files like those in `test-db/` and `qa/` directories are for different test execution contexts (pre-merge, post-merge, QA tests). The same test appearing in both waives.txt and execution list files is intentional - the test is part of test suites but will be skipped due to the waiver.
Applied to files:
tests/integration/test_lists/test-db/l0_dgx_b200.yml
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.
Applied to files:
tests/integration/test_lists/test-db/l0_dgx_b200.ymltests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/test-db/l0_dgx_b200.ymltests/integration/defs/accuracy/test_llm_api_pytorch.py
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.
Applied to files:
tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧬 Code graph analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (4)
tests/integration/defs/accuracy/accuracy_core.py (5)
LlmapiAccuracyTestHarness(949-960)MMLU(317-331)evaluate(184-247)evaluate(868-878)GSM8K(334-349)tests/integration/defs/conftest.py (1)
llm_models_root(80-94)tensorrt_llm/llmapi/llm_args.py (3)
KvCacheConfig(1426-1570)CudaGraphConfig(102-159)MoeConfig(373-407)tensorrt_llm/llmapi/llm.py (1)
LLM(1101-1117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
| model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct" | ||
| model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct" | ||
| kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6, | ||
| enable_block_reuse=False) | ||
| pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler, | ||
| cuda_graph_config=CudaGraphConfig( | ||
| max_batch_size=512) if cuda_graph else None) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't overwrite the local model path with the HuggingFace repo.
You first resolve the pre-downloaded model under llm_models_root(), but then immediately replace it with "Qwen/Qwen3-Next-80B-A3B-Instruct". Our CI runs in an offline environment, so this reassignment forces a network fetch and the test will fail. Lean on the cached artifact and drop the second assignment.
Apply this diff:
- model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"
- model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct"
+ model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct" | |
| model_path = "Qwen/Qwen3-Next-80B-A3B-Instruct" | |
| kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6, | |
| enable_block_reuse=False) | |
| pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler, | |
| cuda_graph_config=CudaGraphConfig( | |
| max_batch_size=512) if cuda_graph else None) | |
| model_path = f"{self.MODEL_PATH}/Qwen3-Next-80B-A3B-Instruct" | |
| kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6, | |
| enable_block_reuse=False) | |
| pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler, | |
| cuda_graph_config=CudaGraphConfig( | |
| max_batch_size=512) if cuda_graph else None) |
🤖 Prompt for AI Agents
In tests/integration/defs/accuracy/test_llm_api_pytorch.py around lines 4139 to
4146, the code resolves a pre-downloaded local model path then immediately
overwrites it with the HuggingFace repo string, which forces a network fetch in
offline CI; remove the second assignment that sets model_path =
"Qwen/Qwen3-Next-80B-A3B-Instruct" so the test uses the cached local model path
returned by llm_models_root(), leaving the initial model_path resolution and
subsequent configs unchanged.
| model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv" | ||
| model_path = "/home/scratch.didow_sw_1/models/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv" | ||
|
|
||
| kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6, | ||
| enable_block_reuse=False) | ||
| pytorch_config = dict(disable_overlap_scheduler=not overlap_scheduler, | ||
| cuda_graph_config=CudaGraphConfig( | ||
| max_batch_size=512) if cuda_graph else None) | ||
| moe_config = MoeConfig(backend=moe_backend) | ||
|
|
||
| with LLM(model_path, | ||
| tensor_parallel_size=tp_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the developer-specific absolute path.
Hard-coding /home/scratch.didow_sw_1/... makes this test unusable anywhere outside your workstation and will break CI immediately. Stick to the artifact rooted under llm_models_root() (and optionally fall back if it’s genuinely missing), but never point to a private filesystem path.
Apply this diff:
- model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"
- model_path = "/home/scratch.didow_sw_1/models/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"
+ model_path = f"{self.MODEL_PATH}/qwen3-next-80b-instruct-nvfp4-ptq-fp8kv"|
/bot run |
|
PR_Github #25452 [ run ] triggered by Bot. Commit: |
|
PR_Github #25452 [ run ] completed with state |
|
/bot run |
|
PR_Github #25459 [ run ] triggered by Bot. Commit: |
|
PR_Github #25459 [ run ] completed with state |
|
/bot run |
|
PR_Github #25466 [ run ] triggered by Bot. Commit: |
|
PR_Github #25466 [ run ] completed with state |
|
/bot run |
|
PR_Github #25480 [ run ] triggered by Bot. Commit: |
|
PR_Github #25480 [ run ] completed with state |
|
/bot run |
|
PR_Github #25493 [ run ] triggered by Bot. Commit: |
|
PR_Github #25493 [ run ] completed with state |
|
/bot run |
|
PR_Github #25511 Bot args parsing error: usage: /bot [-h] |
|
/bot run |
|
PR_Github #25521 [ run ] triggered by Bot. Commit: |
|
PR_Github #25521 [ run ] completed with state |
|
/bot run |
|
PR_Github #25542 [ run ] triggered by Bot. Commit: |
|
PR_Github #25542 [ run ] completed with state |
|
/bot run |
|
PR_Github #25565 [ run ] triggered by Bot. Commit: |
|
PR_Github #25565 [ run ] completed with state |
|
/bot run |
|
PR_Github #25593 [ run ] triggered by Bot. Commit: |
|
PR_Github #25593 [ run ] completed with state |
|
/bot run |
|
PR_Github #25623 [ run ] triggered by Bot. Commit: |
|
PR_Github #25622 [ run ] triggered by Bot. Commit: |
|
PR_Github #25622 [ run ] completed with state |
|
PR_Github #25623 [ run ] completed with state |
|
/bot run |
|
PR_Github #25657 [ run ] triggered by Bot. Commit: |
|
PR_Github #25657 [ run ] completed with state |
Signed-off-by: jiant <[email protected]>
Signed-off-by: jiant <[email protected]>
Signed-off-by: jiant <[email protected]>
Signed-off-by: jiant <[email protected]>
Signed-off-by: jiant <[email protected]>
|
/bot run |
|
PR_Github #25709 [ run ] triggered by Bot. Commit: |
|
PR_Github #25709 [ run ] completed with state |
|
/bot run |
|
PR_Github #25749 [ run ] triggered by Bot. Commit: |
|
PR_Github #25749 [ run ] completed with state |
|
/bot run |
|
PR_Github #25838 [ run ] triggered by Bot. Commit: |
|
PR_Github #25838 [ run ] completed with state |
|
/bot run |
1 similar comment
|
/bot run |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Description
add qwen3-next CI test of accuracy on BF16 and NVFP4