Add DeepSeek MoE detection and export mapping in HF PTQ/export path by Charles-JCJ · Pull Request #1125 · NVIDIA/Model-Optimizer

Charles-JCJ · 2026-03-26T07:02:23Z

What does this PR do?

Type of change: Bug fix

This PR adds missing DeepSeek MoE support in the Hugging Face PTQ/export path by:

mapping n_routed_experts to moe_num_experts
recognizing DeepSeek V2/V3 MoE modules in export logic
handling DeepSeek expert linear names as gate_proj/down_proj/up_proj
extending pre-quant fusion rules for DeepSeek V2/V3 attention and MLP
adding unit tests for the new DeepSeek mappings

Usage

This improves DeepSeek MoE compatibility in the existing Hugging Face PTQ/export flow.

Testing

Added unit test:
- tests/unit/torch/export/test_deepseek_export_support.py
Performed syntax validation with:
- python3 -m py_compile ...

Before your PR is "Ready for review"

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Summary by CodeRabbit

New Features
- Added recognition and export support for DeepSeek v2/v3 architectures, including Mixture-of-Experts naming and layer handling
- Extended quantization pre-fuse mappings to cover DeepSeek attention and MLP components
- Hugging Face config mappings now accept an additional routed-experts key for MoE configs
Tests
- Added unit tests covering DeepSeek export, MoE detection, expert naming, grouping, config mapping, and quantization mappings

copy-pr-bot · 2026-03-26T07:02:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-26T07:02:40Z

📝 Walkthrough

Walkthrough

Adds DeepSeek (v2/v3) export support across the export pipeline: HF config mapping, MoE detection and expert extraction, Megatron/NEMO layer handling tweaks, PQS quantization pre-fuse mappings, and unit tests covering DeepSeek MoE behaviors and mappings.

Changes

Cohort / File(s)	Summary
HF config mapping `modelopt/torch/export/hf_config_map.py`	Recognize `n_routed_experts` (in addition to `num_local_experts`/`moe_num_experts`) and map it to `moe_num_experts`; updated inline comment to include DeepSeek.
Layer / MoE handling `modelopt/torch/export/layer_utils.py`	Added DeepSeek MoE detection and mapping (treat DeepSeek as MoE with `gate`+`experts`, map expert linears to `["gate_proj","down_proj","up_proj"]`), adjusted explicit MoE name set (`"gptossmoe"` added, `"deepseekmoe"` removed), and added Megatron/NEMO-specific transformer-layer extraction logic with optional unwrapping and legacy checkpoint handling.
Quantization pre-fuse mapping `modelopt/torch/export/quant_utils.py`	Extended `PQS_FUSE_MODULE_MAPPING` to include `DeepseekV2Attention`/`DeepseekV3Attention` fused as `("v_proj","o_proj")` and `DeepseekV2MLP`/`DeepseekV3MLP` fused as `("up_proj","down_proj")`.
Tests `tests/unit/torch/export/test_deepseek_export_support.py`	Added unit tests with fake DeepSeek v3 MoE components verifying: MoE detection, expert linear-name discovery, expert grouping via `get_experts_list`, HF config mapping entry for `n_routed_experts`, and PQS fuse mapping entries for DeepSeek attention/MLP.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: adding DeepSeek MoE detection and export mapping support in the HF PTQ/export path, which is directly reflected in all modified files.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected in modified files; code contains only configuration mappings and model type detection logic without security-sensitive operations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/export/layer_utils.py (1)
330-343: LGTM on DeepSeek detection, but verify gptossmoe coverage.

The _is_deepseek_moe_name helper and its usage in is_moe correctly identify DeepSeek MoE modules by requiring both "deepseek" and "moe" in the name plus the presence of gate and experts attributes.

However, gptossmoe was added to the explicit matches (line 343) but the test file only covers DeepSeek. Consider adding test coverage for GptOssMoE detection to prevent regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/layer_utils.py` around lines 330 - 343, Add a unit test
to assert that is_moe correctly detects modules named/typed as GptOssMoE: create
a minimal nn.Module subclass whose class name includes "GptOssMoE" (or set
type(module).__name__ to that via a real class name) and verify is_moe(module)
returns True; place this alongside the existing DeepSeek test to ensure the
explicit match for "gptossmoe" in is_moe is covered and guarded against
regressions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/layer_utils.py`:
- Around line 330-343: Add a unit test to assert that is_moe correctly detects
modules named/typed as GptOssMoE: create a minimal nn.Module subclass whose
class name includes "GptOssMoE" (or set type(module).__name__ to that via a real
class name) and verify is_moe(module) returns True; place this alongside the
existing DeepSeek test to ensure the explicit match for "gptossmoe" in is_moe is
covered and guarded against regressions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03ce4234-32ec-4dab-b878-35b0766d665d

📥 Commits

Reviewing files that changed from the base of the PR and between b1f9f01 and 0cae4cb.

📒 Files selected for processing (4)

modelopt/torch/export/hf_config_map.py
modelopt/torch/export/layer_utils.py
modelopt/torch/export/quant_utils.py
tests/unit/torch/export/test_deepseek_export_support.py

Charles-JCJ

Could a maintainer please approve the pending workflows for this PR? Thanks!

Signed-off-by: Charles.J <jiangchangjian247@gmail.com>

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/export/layer_utils.py (1)

155-180: Consider adding defensive attribute checks for legacy NEMO model handling.

The code assumes language_model.embedding and language_model.encoder exist after finding a TransformerLanguageModel. While this is likely always true for valid NEMO models, adding explicit checks would make the code more robust and provide clearer error messages if assumptions are violated.

💡 Suggested defensive checks

         if language_model:
             warn("Warning: this is an old NEMO checkpoint format and will be deprecated soon.")
+            if not hasattr(language_model, "embedding") or not hasattr(language_model, "encoder"):
+                raise ValueError(
+                    "TransformerLanguageModel missing expected 'embedding' or 'encoder' attributes"
+                )
             layers = list(language_model.embedding.children()) + list(
                 language_model.encoder.children()
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/layer_utils.py` around lines 155 - 180, The legacy NEMO
branch that locates a TransformerLanguageModel (variable language_model) should
defensively verify attributes before accessing them: check
hasattr(language_model, "embedding") and hasattr(language_model, "encoder")
before building layers, and only call list(language_model.embedding.children())
or list(language_model.encoder.children()) when present; if either is missing
raise or warn with a clear message mentioning TransformerLanguageModel so
callers know the checkpoint is unexpected; likewise guard access to
language_model.output_layer (already partially handled) and append it only if
hasattr(language_model, "output_layer"); update the code around the
language_model handling in layer_utils.py to perform these attribute checks and
produce explicit error/warning messages rather than assuming attributes exist.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/layer_utils.py`:
- Around line 155-180: The legacy NEMO branch that locates a
TransformerLanguageModel (variable language_model) should defensively verify
attributes before accessing them: check hasattr(language_model, "embedding") and
hasattr(language_model, "encoder") before building layers, and only call
list(language_model.embedding.children()) or
list(language_model.encoder.children()) when present; if either is missing raise
or warn with a clear message mentioning TransformerLanguageModel so callers know
the checkpoint is unexpected; likewise guard access to
language_model.output_layer (already partially handled) and append it only if
hasattr(language_model, "output_layer"); update the code around the
language_model handling in layer_utils.py to perform these attribute checks and
produce explicit error/warning messages rather than assuming attributes exist.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9b14db8-e4cc-4731-b455-7a74ba225052

📥 Commits

Reviewing files that changed from the base of the PR and between 0cae4cb and de8820f.

📒 Files selected for processing (4)

modelopt/torch/export/hf_config_map.py
modelopt/torch/export/layer_utils.py
modelopt/torch/export/quant_utils.py
tests/unit/torch/export/test_deepseek_export_support.py

✅ Files skipped from review due to trivial changes (1)

tests/unit/torch/export/test_deepseek_export_support.py

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/export/hf_config_map.py

cjluo-nv · 2026-03-31T15:33:29Z

@Charles-JCJ is this work to support HF quantized checkpoint export or TRTLLM (old format) checkpoint export?

cjluo-nv

Summary: Adds DeepSeek V2/V3 MoE support to the HF PTQ/export pipeline — config mapping for n_routed_experts, MoE detection, expert linear name resolution, and pre-quant fusion rules. Also includes an unrelated Megatron/NEMO get_transformer_layers change and several comment fixups.

Issues Found:

[Readability] Duplicated branch in get_experts_list (layer_utils.py, diff lines for the elif "deepseek" in model_type: block around line 101-102):
The new elif "deepseek" in model_type: branch sets linear_names = ["gate_proj", "down_proj", "up_proj"] — identical to the Qwen branch immediately above it. Merge DeepSeek into the existing any(...) check:
```
elif any(
    variant in model_type
    for variant in [
        "qwenmoeforcausallm",
        "qwen2moeforcausallm",
        "qwen3moeforcausallm",
        "qwen3nextforcausallm",
        "deepseek",
    ]
):
```
Or alternatively, keep it separate but add a comment explaining the intentional separation — as-is it looks like an oversight.
[Correctness] get_experts_list uses loose substring match (layer_utils.py, new line ~101): "deepseek" in model_type would match any model_type containing "deepseek" as a substring. Currently that's fine, but it's inconsistent with the more specific checks for other models (e.g., "mixtralforcausallm", "qwen2moeforcausallm"). If a future DeepSeek variant uses different expert linear names, this broad match would silently produce wrong results. Consider matching "deepseekv2" or "deepseekv3" explicitly, or at minimum "deepseek" with a comment noting the assumption.
[Correctness] Scope creep — Megatron/NEMO get_transformer_layers changes (layer_utils.py, diff lines 153-181): The Megatron/NEMO handling block (BFS for TransformerLanguageModel, legacy checkpoint support) is unrelated to DeepSeek MoE detection. This is a significant chunk of new logic (~30 lines) with no tests and no mention in the PR description. It should either:
- Be split into a separate PR with its own tests, or
- Be explicitly documented in this PR's description with test coverage.
[Tests] No tests for Megatron/NEMO get_transformer_layers additions: The new Megatron/NEMO branch in get_transformer_layers includes a BFS search, conditional unwrapping, and a deprecation warning — all untested. This is particularly risky because it modifies the model traversal entry point.
[Tests] No negative test for is_moe structural check: The new is_moe logic requires both hasattr(module, "gate") and hasattr(module, "experts"). A test with a DeepSeek-named module missing one of these attributes would validate the guard. Currently only the happy path is tested.
[Readability] Copyright year (test_deepseek_export_support.py, line 1): The copyright header says 2024, but this is a new file created in 2026.
[Readability] Comment-only changes mixed in (layer_utils.py): The Megatron-core → NEMO comment rename (line 393→393 in diff) and the MCore → MCore / NeMo comment fix (line 1493 area) are fine individually but further blur the PR's scope.

Suggestions:

The _is_deepseek_moe_name helper is a nice pattern — consider also using it in get_experts_list instead of the raw "deepseek" in model_type check, for consistency.
The DeepseekV2Attention/DeepseekV3Attention entries in PQS_FUSE_MODULE_MAPPING are added to the same tuple as LlamaAttention. If DeepSeek attention has a fundamentally different structure (e.g., MLA with latent attention), verify the v_proj/o_proj fusion math still holds.
Consider adding "qwen3_5moeforcausallm" to get_experts_list since Qwen3_5MoeSparseMoeBlock is already in get_expert_linear_names.

Overall Assessment: The core DeepSeek MoE changes are straightforward and correct. However, the Megatron/NEMO changes are out of scope and untested, which is the primary blocking concern. The DeepSeek-specific changes themselves are low-risk additions to existing patterns.

Charles-JCJ requested a review from a team as a code owner March 26, 2026 07:02

Charles-JCJ requested a review from sugunav14 March 26, 2026 07:02

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Charles-JCJ changed the title ~~Fix DeepSeek MoE detection and export mapping in HF PTQ/export path~~ Add DeepSeek MoE detection and export mapping in HF PTQ/export path Mar 26, 2026

Charles-JCJ commented Mar 26, 2026

View reviewed changes

Fix DeepSeek MoE detection and export mapping

de8820f

Signed-off-by: Charles.J <jiangchangjian247@gmail.com>

Charles-JCJ force-pushed the deepseek-moe-export-support branch from 0cae4cb to de8820f Compare March 27, 2026 02:22

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

cjluo-nv self-requested a review March 31, 2026 15:33

cjluo-nv reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DeepSeek MoE detection and export mapping in HF PTQ/export path#1125

Add DeepSeek MoE detection and export mapping in HF PTQ/export path#1125
Charles-JCJ wants to merge 1 commit intoNVIDIA:mainfrom
Charles-JCJ:deepseek-moe-export-support

Charles-JCJ commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Charles-JCJ left a comment •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

cjluo-nv commented Mar 31, 2026

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Charles-JCJ commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 26, 2026

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Charles-JCJ left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv commented Mar 31, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Charles-JCJ commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Charles-JCJ left a comment •

edited

Loading