Skip to content

Conversation

@byshiue
Copy link
Collaborator

@byshiue byshiue commented Dec 1, 2025

This PR adds support to Mistral-Large-3.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Mistral Large V3 models with specialized checkpoint loading and weight mapping
    • Introduced Pixtral vision-language model support with image processing capabilities
    • Added checkpoint format parameter for flexible model loading
    • Enhanced speculative decoding with Eagle3 support for Mistral Large 3
    • Extended FP8 quantization options for improved model compression
  • Bug Fixes

    • Improved tokenization robustness and speculative decoding token management
  • Documentation

    • Added comprehensive Mistral Large V3 setup and deployment guide

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

This pull request adds comprehensive support for Mistral Large V3 models through new config loaders, weight mappers, and checkpoint handlers. It integrates MistralTokenizer with specialized processors, enhances DeepSeekV3 with Mistral weight mapping, introduces FP8 quantization config handling, refactors MoE activation type management across kernels and modules, updates speculative decoding metadata handling, and extends serving/executor infrastructure with checkpoint format awareness.

Changes

Cohort / File(s) Summary
Mistral Model Checkpoint & Config Loading
tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py, tensorrt_llm/_torch/models/checkpoints/mistral/config_loader.py
New modules introducing MistralCheckpointLoader, MistralLarge3CheckpointLoader, and MistralConfigLoader. Implements weight preprocessing, per-tensor scale broadcasting, and comprehensive config adaptation pipeline supporting quantization, MOE, YARN scaling, audio/vision modality, and special token handling.
Mistral Model Weight Mapping
tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py
New module adding MistralWeightMapper and MistralLarge3WeightMapper classes supporting weight name translation, Q/K projection permutation, and compatibility with Pixtral and Eagle3 naming conventions.
Checkpoint & Weight Loader Registration
tensorrt_llm/_torch/models/checkpoints/__init__.py, tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
Exports new Mistral checkpoint/config/weight components. Registers HfWeightLoader for "mistral" and "mistral_large_3" formats.
Activation Type Management
tensorrt_llm/_torch/utils.py, tensorrt_llm/_torch/custom_ops/torch_custom_ops.py, tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/trtllm_moe.py, cpp/tensorrt_llm/kernels/cutlass_kernels/include/common.h, cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu
Introduces ActivationType IntEnum in utils; relocates from torch_custom_ops; updates import paths. Reorders activation function options in MOE kernel launcher. Adds documentation comments enforcing alignment with activation function lists.
MoE Module Enhancements
tensorrt_llm/_torch/modules/fused_moe/interface.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py, tensorrt_llm/_torch/modules/fused_moe/ops/moe_op_cutlass.py, tensorrt_llm/_torch/modules/fused_moe/moe_load_balancer.py
Adds activation_type parameter to MoE initialization and threading through custom ops. Introduces use_workspace_output flag for workspace-based MoE outputs. Adds PixtralForConditionalGeneration to supported MoE architectures.
Model Configuration & FP8 Quantization
tensorrt_llm/_torch/model_config.py
Extends load_hf_quant_config with FP8 quantization path handling (block-scaled and static). Adds pretrained_hf_config parameter to from_pretrained classmethod. Implements DeepSeekSparseAttentionConfig application for DeepseekV32ForCausalLM architecture.
Mistral Model Integration
tensorrt_llm/_torch/models/modeling_mistral.py
Adds MistralTokenizer and PixtralProcessorAdapter imports. Extends tokenizer/processor initialization with fallback to MistralTokenizer. Implements weight mapping and draft weight loading. Adds spec_metadata propagation in forward methods. Registers PixtralForConditionalGeneration.
DeepSeekV3 & Mistral Large 3 Support
tensorrt_llm/_torch/models/modeling_deepseekv3.py, tensorrt_llm/_torch/models/modeling_speculative.py
Adds dequant_tensor_fp8 utility and topk_method parameter to MoE components. Introduces weight mapper support in loader paths. Registers MistralLarge3ForCausalLM. Adds MistralLarge3DraftModel and MistralLarge3EagleForCausalLM for Eagle3 speculative decoding.
Model Loading & Weight Mapping Infrastructure
tensorrt_llm/_torch/models/modeling_utils.py, tensorrt_llm/_torch/models/modeling_pixtral.py
Adds params_map parameter to load_weights in DecoderModel and DecoderModelForCausalLM. Updates PixtralVisionModel.load_weights to accept arbitrary kwargs with params_map support.
Executor & Runtime Configuration
tensorrt_llm/_torch/pyexecutor/config_utils.py, tensorrt_llm/_torch/pyexecutor/model_engine.py, tensorrt_llm/_torch/pyexecutor/model_loader.py
Adds checkpoint_format parameter to config loading. Extends model engine to pass checkpoint format to input processor. Introduces _load_model_with_moe_load_balancer and KV cache quantization "force_no_quant" option.
Speculative Decoding Updates
tensorrt_llm/_torch/speculative/eagle3.py
Refactors attention metadata token counting across draft steps. Moves from in-step mutation of spec_metadata to restoration of attn_metadata state.
LLM API & Argument Configuration
tensorrt_llm/llmapi/llm.py, tensorrt_llm/llmapi/llm_args.py
Adds debug logging for input processor. Updates executor creation call with additional arguments. Adds eagle3_mistral_large_3 configuration field to EagleDecodingConfig with runtime behavior enforcement.
Mistral Tokenizer & Processor
tensorrt_llm/llmapi/tokenizer.py
Introduces MistralTokenizer class wrapping Tekken and SentencePiece tokenizers with version handling, special token support, and incremental detokenization. Adds PixtralProcessorAdapter for image processing integration.
Serving Infrastructure
tensorrt_llm/commands/serve.py, tensorrt_llm/executor/postproc_worker.py, tensorrt_llm/inputs/registry.py, tensorrt_llm/inputs/utils.py, tensorrt_llm/serve/openai_server.py, tensorrt_llm/serve/scripts/benchmark_serving.py
Adds conditional MistralTokenizer loading for "mistral_large_3" format. Updates config loading and tokenizer initialization paths. Adds vocab_size robustness checks. Introduces --checkpoint-format and --disable-detokenize CLI options.
CLI & Requirements
examples/llm-api/quickstart_advanced.py, requirements.txt
Adds --checkpoint_format CLI option to quickstart_advanced.py. Adds mistral-common dependency.
Documentation & Testing
examples/models/core/mistral_large_3/README.md, jenkins/L0_MergeRequest.groovy, tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_trtllm_moe.py
Adds comprehensive Mistral Large V3 README covering deployment modes, LLM API usage, and serving. Adds custom_ops.py to multi-GPU file list. Updates test import path for ActivationType.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • tensorrt_llm/_torch/models/checkpoints/mistral/config_loader.py: Extensive config adaptation pipeline with multiple quantization and modality handling paths; requires thorough validation of parameter mapping logic and edge cases.
  • tensorrt_llm/_torch/model_config.py: FP8 quantization branching logic and DeepSeekSparseAttentionConfig conditional application; verify correctness of pretrained_hf_config precedence handling.
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py: Weight mapper integration and conditional loading paths; ensure topk_method parameter propagates correctly through MoE hierarchy.
  • tensorrt_llm/llmapi/tokenizer.py: Large new class with multimodal and incremental detokenization logic; verify compatibility with downstream components and special token handling.
  • tensorrt_llm/_torch/pyexecutor/model_loader.py: MOE load balancer fallback logic and KV cache quantization option addition; ensure error handling and state consistency.
  • Weight mapper and checkpoint loader interactions: Verify that weight renaming and per-tensor scale broadcasting work correctly across different model variants.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is mostly empty template boilerplate with only a one-line statement 'This PR adds support to Mistral-Large-3.' The required sections (Description, Test Coverage) lack substantive content and are unfilled placeholders. Fill in the Description section explaining what Mistral-Large-3 support includes and why. Complete the Test Coverage section listing relevant tests that validate the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 9.66% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][feat] Support Mistral Large 3 model' clearly identifies the primary change as adding support for Mistral Large 3, which is reflected across numerous files in the changeset including checkpoint loaders, config loaders, weight mappers, and model implementations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@byshiue byshiue requested a review from a team as a code owner December 4, 2025 06:55
@byshiue byshiue force-pushed the feat/mistral_large_3 branch from dc2d4c3 to b0230c6 Compare December 4, 2025 06:57
Signed-off-by: Tracin <[email protected]>
Signed-off-by: bhsueh <[email protected]>
@byshiue byshiue force-pushed the feat/mistral_large_3 branch from b0230c6 to 9349e63 Compare December 4, 2025 07:09
@tensorrt-cicd
Copy link
Collaborator

PR_Github #26863 [ run ] completed with state SUCCESS. Commit: 7cc16f8
/LLM/main/L0_MergeRequest_PR pipeline #20462 completed with status: 'FAILURE'

@byshiue
Copy link
Collaborator Author

byshiue commented Dec 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26935 [ run ] triggered by Bot. Commit: 9349e63

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26935 [ run ] completed with state SUCCESS. Commit: 9349e63
/LLM/main/L0_MergeRequest_PR pipeline #20524 completed with status: 'FAILURE'

@byshiue byshiue force-pushed the feat/mistral_large_3 branch from 9349e63 to 33dbcd7 Compare December 4, 2025 15:34
@byshiue byshiue force-pushed the feat/mistral_large_3 branch from 33dbcd7 to 9c60f9f Compare December 4, 2025 15:39
@byshiue
Copy link
Collaborator Author

byshiue commented Dec 4, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26987 [ run ] triggered by Bot. Commit: 9c60f9f

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26987 [ run ] completed with state SUCCESS. Commit: 9c60f9f
/LLM/main/L0_MergeRequest_PR pipeline #20573 completed with status: 'FAILURE'

@byshiue
Copy link
Collaborator Author

byshiue commented Dec 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27037 [ run ] triggered by Bot. Commit: 9c60f9f

@yweng0828 yweng0828 requested a review from mikeiovine December 5, 2025 03:11
Copy link
Collaborator

@chzblych chzblych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license of mistral-common is Apache 2.0.

https://github.com/mistralai/mistral-common/blob/main/LICENCE

@byshiue byshiue changed the title [None][feat] Support Mistral Large 3 model Draft: [None][feat] Support Mistral Large 3 model Dec 5, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #27037 [ run ] completed with state SUCCESS. Commit: 9c60f9f
/LLM/main/L0_MergeRequest_PR pipeline #20619 completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants