Draft: [None][feat] Support Mistral Large 3 model #9589

byshiue · 2025-12-01T12:16:50Z

This PR adds support to Mistral-Large-3.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

New Features
- Added support for Mistral Large V3 models with specialized checkpoint loading and weight mapping
- Introduced Pixtral vision-language model support with image processing capabilities
- Added checkpoint format parameter for flexible model loading
- Enhanced speculative decoding with Eagle3 support for Mistral Large 3
- Extended FP8 quantization options for improved model compression
Bug Fixes
- Improved tokenization robustness and speculative decoding token management
Documentation
- Added comprehensive Mistral Large V3 setup and deployment guide

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-01T12:28:20Z

📝 Walkthrough

Walkthrough

This pull request adds comprehensive support for Mistral Large V3 models through new config loaders, weight mappers, and checkpoint handlers. It integrates MistralTokenizer with specialized processors, enhances DeepSeekV3 with Mistral weight mapping, introduces FP8 quantization config handling, refactors MoE activation type management across kernels and modules, updates speculative decoding metadata handling, and extends serving/executor infrastructure with checkpoint format awareness.

Changes

Cohort / File(s)	Summary
Mistral Model Checkpoint & Config Loading `tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py`, `tensorrt_llm/_torch/models/checkpoints/mistral/config_loader.py`	New modules introducing `MistralCheckpointLoader`, `MistralLarge3CheckpointLoader`, and `MistralConfigLoader`. Implements weight preprocessing, per-tensor scale broadcasting, and comprehensive config adaptation pipeline supporting quantization, MOE, YARN scaling, audio/vision modality, and special token handling.
Mistral Model Weight Mapping `tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py`	New module adding `MistralWeightMapper` and `MistralLarge3WeightMapper` classes supporting weight name translation, Q/K projection permutation, and compatibility with Pixtral and Eagle3 naming conventions.
Checkpoint & Weight Loader Registration `tensorrt_llm/_torch/models/checkpoints/__init__.py`, `tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py`	Exports new Mistral checkpoint/config/weight components. Registers `HfWeightLoader` for "mistral" and "mistral_large_3" formats.
Activation Type Management `tensorrt_llm/_torch/utils.py`, `tensorrt_llm/_torch/custom_ops/torch_custom_ops.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/trtllm_moe.py`, `cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h`, `cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu`	Introduces `ActivationType` IntEnum in utils; relocates from torch_custom_ops; updates import paths. Reorders activation function options in MOE kernel launcher. Adds documentation comments enforcing alignment with activation function lists.
MoE Module Enhancements `tensorrt_llm/_torch/modules/fused_moe/interface.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`, `tensorrt_llm/_torch/modules/fused_moe/ops/moe_op_cutlass.py`, `tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py`	Adds `activation_type` parameter to MoE initialization and threading through custom ops. Introduces `use_workspace_output` flag for workspace-based MoE outputs. Adds `PixtralForConditionalGeneration` to supported MoE architectures.
Model Configuration & FP8 Quantization `tensorrt_llm/_torch/model_config.py`	Extends `load_hf_quant_config` with FP8 quantization path handling (block-scaled and static). Adds `pretrained_hf_config` parameter to `from_pretrained` classmethod. Implements DeepSeekSparseAttentionConfig application for DeepseekV32ForCausalLM architecture.
Mistral Model Integration `tensorrt_llm/_torch/models/modeling_mistral.py`	Adds MistralTokenizer and PixtralProcessorAdapter imports. Extends tokenizer/processor initialization with fallback to MistralTokenizer. Implements weight mapping and draft weight loading. Adds `spec_metadata` propagation in forward methods. Registers `PixtralForConditionalGeneration`.
DeepSeekV3 & Mistral Large 3 Support `tensorrt_llm/_torch/models/modeling_deepseekv3.py`, `tensorrt_llm/_torch/models/modeling_speculative.py`	Adds `dequant_tensor_fp8` utility and `topk_method` parameter to MoE components. Introduces weight mapper support in loader paths. Registers `MistralLarge3ForCausalLM`. Adds `MistralLarge3DraftModel` and `MistralLarge3EagleForCausalLM` for Eagle3 speculative decoding.
Model Loading & Weight Mapping Infrastructure `tensorrt_llm/_torch/models/modeling_utils.py`, `tensorrt_llm/_torch/models/modeling_pixtral.py`	Adds `params_map` parameter to `load_weights` in `DecoderModel` and `DecoderModelForCausalLM`. Updates `PixtralVisionModel.load_weights` to accept arbitrary kwargs with params_map support.
Executor & Runtime Configuration `tensorrt_llm/_torch/pyexecutor/config_utils.py`, `tensorrt_llm/_torch/pyexecutor/model_engine.py`, `tensorrt_llm/_torch/pyexecutor/model_loader.py`	Adds `checkpoint_format` parameter to config loading. Extends model engine to pass checkpoint format to input processor. Introduces `_load_model_with_moe_load_balancer` and KV cache quantization "force_no_quant" option.
Speculative Decoding Updates `tensorrt_llm/_torch/speculative/eagle3.py`	Refactors attention metadata token counting across draft steps. Moves from in-step mutation of spec_metadata to restoration of attn_metadata state.
LLM API & Argument Configuration `tensorrt_llm/llmapi/llm.py`, `tensorrt_llm/llmapi/llm_args.py`	Adds debug logging for input processor. Updates executor creation call with additional arguments. Adds `eagle3_mistral_large_3` configuration field to `EagleDecodingConfig` with runtime behavior enforcement.
Mistral Tokenizer & Processor `tensorrt_llm/llmapi/tokenizer.py`	Introduces `MistralTokenizer` class wrapping Tekken and SentencePiece tokenizers with version handling, special token support, and incremental detokenization. Adds `PixtralProcessorAdapter` for image processing integration.
Serving Infrastructure `tensorrt_llm/commands/serve.py`, `tensorrt_llm/executor/postproc_worker.py`, `tensorrt_llm/inputs/registry.py`, `tensorrt_llm/inputs/utils.py`, `tensorrt_llm/serve/openai_server.py`, `tensorrt_llm/serve/scripts/benchmark_serving.py`	Adds conditional MistralTokenizer loading for "mistral_large_3" format. Updates config loading and tokenizer initialization paths. Adds vocab_size robustness checks. Introduces `--checkpoint-format` and `--disable-detokenize` CLI options.
CLI & Requirements `examples/llm-api/quickstart_advanced.py`, `requirements.txt`	Adds `--checkpoint_format` CLI option to quickstart_advanced.py. Adds mistral-common dependency.
Documentation & Testing `examples/models/core/mistral_large_3/README.md`, `jenkins/L0_MergeRequest.groovy`, `tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_trtllm_moe.py`	Adds comprehensive Mistral Large V3 README covering deployment modes, LLM API usage, and serving. Adds custom_ops.py to multi-GPU file list. Updates test import path for ActivationType.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

tensorrt_llm/_torch/models/checkpoints/mistral/config_loader.py: Extensive config adaptation pipeline with multiple quantization and modality handling paths; requires thorough validation of parameter mapping logic and edge cases.
tensorrt_llm/_torch/model_config.py: FP8 quantization branching logic and DeepSeekSparseAttentionConfig conditional application; verify correctness of pretrained_hf_config precedence handling.
tensorrt_llm/_torch/models/modeling_deepseekv3.py: Weight mapper integration and conditional loading paths; ensure topk_method parameter propagates correctly through MoE hierarchy.
tensorrt_llm/llmapi/tokenizer.py: Large new class with multimodal and incremental detokenization logic; verify compatibility with downstream components and special token handling.
tensorrt_llm/_torch/pyexecutor/model_loader.py: MOE load balancer fallback logic and KV cache quantization option addition; ensure error handling and state consistency.
Weight mapper and checkpoint loader interactions: Verify that weight renaming and per-tensor scale broadcasting work correctly across different model variants.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is mostly empty template boilerplate with only a one-line statement 'This PR adds support to Mistral-Large-3.' The required sections (Description, Test Coverage) lack substantive content and are unfilled placeholders.	Fill in the Description section explaining what Mistral-Large-3 support includes and why. Complete the Test Coverage section listing relevant tests that validate the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.66% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][feat] Support Mistral Large 3 model' clearly identifies the primary change as adding support for Mistral Large 3, which is reflected across numerous files in the changeset including checkpoint loaders, config loaders, weight mappers, and model implementations.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Tracin <[email protected]> Signed-off-by: bhsueh <[email protected]>

tensorrt-cicd · 2025-12-04T08:04:40Z

PR_Github #26863 [ run ] completed with state SUCCESS. Commit: 7cc16f8
/LLM/main/L0_MergeRequest_PR pipeline #20462 completed with status: 'FAILURE'

byshiue · 2025-12-04T09:16:51Z

/bot run

tensorrt-cicd · 2025-12-04T09:28:17Z

PR_Github #26935 [ run ] triggered by Bot. Commit: 9349e63

tensorrt-cicd · 2025-12-04T10:58:34Z

PR_Github #26935 [ run ] completed with state SUCCESS. Commit: 9349e63
/LLM/main/L0_MergeRequest_PR pipeline #20524 completed with status: 'FAILURE'

Signed-off-by: bhsueh <[email protected]>

byshiue · 2025-12-04T15:40:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-04T15:45:49Z

PR_Github #26987 [ run ] triggered by Bot. Commit: 9c60f9f

tensorrt-cicd · 2025-12-04T20:10:37Z

PR_Github #26987 [ run ] completed with state SUCCESS. Commit: 9c60f9f
/LLM/main/L0_MergeRequest_PR pipeline #20573 completed with status: 'FAILURE'

byshiue · 2025-12-05T01:04:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-05T01:26:01Z

PR_Github #27037 [ run ] triggered by Bot. Commit: 9c60f9f

tensorrt_llm/_torch/models/modeling_mistral.py

tensorrt_llm/llmapi/llm_args.py

tensorrt_llm/_torch/models/modeling_mistral.py

Signed-off-by: Tracin <[email protected]>

tensorrt_llm/_torch/pyexecutor/model_loader.py

tensorrt_llm/llmapi/tokenizer.py

tensorrt_llm/commands/serve.py

tensorrt_llm/serve/openai_server.py

tensorrt_llm/serve/scripts/benchmark_serving.py

tensorrt_llm/_torch/models/modeling_deepseekv3.py

examples/llm-api/quickstart_advanced.py

examples/models/core/mistral_large_3/README.md

tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py

tensorrt_llm/_torch/models/modeling_mistral.py

tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py

tensorrt_llm/_torch/pyexecutor/model_loader.py

tensorrt_llm/commands/serve.py

tensorrt_llm/llmapi/tokenizer.py

chzblych

The license of mistral-common is Apache 2.0.

https://github.com/mistralai/mistral-common/blob/main/LICENCE

tensorrt-cicd · 2025-12-05T22:14:43Z

PR_Github #27037 [ run ] completed with state SUCCESS. Commit: 9c60f9f
/LLM/main/L0_MergeRequest_PR pipeline #20619 completed with status: 'FAILURE'

byshiue requested review from a team as code owners December 1, 2025 12:16

byshiue requested review from galagam, hchings, hlu1, hypdeb, kris1025, laikhtewari, niukuo, nv-guomingz, rakib-hasan, shaharmor98, symphonylyh, yizhang-nv, yuxianq and zeroepoch December 1, 2025 12:16

byshiue requested a review from a team as a code owner December 4, 2025 06:55

byshiue force-pushed the feat/mistral_large_3 branch from dc2d4c3 to b0230c6 Compare December 4, 2025 06:57

Add llama4 scaling for Mistral Large 3.

9349e63

Signed-off-by: Tracin <[email protected]> Signed-off-by: bhsueh <[email protected]>

byshiue force-pushed the feat/mistral_large_3 branch from b0230c6 to 9349e63 Compare December 4, 2025 07:09

[Fix] fix bug of mistral eagle3

75a9aea

Signed-off-by: bhsueh <[email protected]>

byshiue force-pushed the feat/mistral_large_3 branch from 9349e63 to 33dbcd7 Compare December 4, 2025 15:34

[Fix] fix bug of llama4_scaling

9c60f9f

Signed-off-by: bhsueh <[email protected]>

byshiue force-pushed the feat/mistral_large_3 branch from 33dbcd7 to 9c60f9f Compare December 4, 2025 15:39

jaedeok-nvidia approved these changes Dec 5, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_mistral.py Show resolved Hide resolved

yweng0828 approved these changes Dec 5, 2025

View reviewed changes

tensorrt_llm/llmapi/llm_args.py Show resolved Hide resolved

yweng0828 requested a review from mikeiovine December 5, 2025 03:11

yechank-nvidia reviewed Dec 5, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_mistral.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_mistral.py Show resolved Hide resolved

Mistral Large3 use Renormalize roting.

20fc714

Signed-off-by: Tracin <[email protected]>

QiJune reviewed Dec 5, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_loader.py Show resolved Hide resolved

syuoni reviewed Dec 5, 2025

View reviewed changes

syuoni approved these changes Dec 5, 2025

View reviewed changes

hypdeb requested changes Dec 5, 2025

View reviewed changes

chzblych approved these changes Dec 5, 2025

View reviewed changes

byshiue changed the title ~~[None][feat] Support Mistral Large 3 model~~ Draft: [None][feat] Support Mistral Large 3 model Dec 5, 2025

Draft: [None][feat] Support Mistral Large 3 model #9589

Are you sure you want to change the base?

Draft: [None][feat] Support Mistral Large 3 model #9589

Uh oh!

Conversation

byshiue commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

byshiue commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

byshiue commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

byshiue commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chzblych left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

byshiue commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading