Skip to content

Conversation

@nvchenghaoz
Copy link
Collaborator

@nvchenghaoz nvchenghaoz commented Nov 25, 2025

Based on the previous PR - #9344, Split the previous PR into two parts, a_log fusion and the mamba perf optimization.

This PR is focus on the a_log fusion.

Addressed reviewers' comments.

Summary by CodeRabbit

  • New Features

    • Added optimization transform for Mamba models that fuses A_log parameters, reducing memory usage during inference.
  • Tests

    • Added comprehensive test coverage for the new Mamba optimization transform.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

The PR introduces a new FuseMambaALog transform that optimizes Mamba models by fusing A_log parameter usage into a precomputed A_fused parameter, eliminating redundant neg-exp computations. Changes include the transform implementation, its configuration registration, and tests validating fusion and memory improvements.

Changes

Cohort / File(s) Summary
Configuration
examples/auto_deploy/nano_v3.yaml
Registers new fuse_mamba_a_log transform entry with stage: post_load_fusion and enabled: true in the transforms section
Core Transform
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py
Implements FuseMambaALog class that registers and matches neg(exp(...)) graph patterns, traces A_log attributes, creates fused A_fused parameters, rewrites graph nodes, and returns TransformInfo with match statistics and shape metadata
Tests
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py
Provides DummyMambaALogModule fixture and two tests: one validating fused parameter creation and absence of exp nodes, another measuring and verifying memory usage delta from fusion

Sequence Diagram

sequenceDiagram
    participant User
    participant FuseMambaALog
    participant GraphModule
    participant PatternMatcher
    participant GraphRewriter

    User->>FuseMambaALog: _apply(GraphModule, ...)
    FuseMambaALog->>PatternMatcher: Register neg(exp(...)) patterns
    FuseMambaALog->>GraphModule: Scan for pattern matches
    GraphModule-->>PatternMatcher: Returns matched nodes
    alt Matches Found
        FuseMambaALog->>GraphModule: Extract A_log attribute
        FuseMambaALog->>GraphModule: Compute -exp(A_log) → A_fused
        FuseMambaALog->>GraphRewriter: Create/reuse A_fused parameter
        GraphRewriter->>GraphModule: Replace neg(exp) with A_fused reference
        FuseMambaALog->>GraphModule: Remove dead code
    else No Matches
        FuseMambaALog-->>User: Return unchanged GraphModule
    end
    FuseMambaALog-->>User: Return TransformInfo with statistics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • The FuseMambaALog transform contains multi-step graph pattern matching, attribute tracing, parameter creation, and rewriting logic that requires careful verification
  • Pattern registration and graph mutation logic involving both functional and method-based neg-exp sequences need scrutiny for correctness
  • Safety checks and fallback handling for missing GraphModule references should be validated

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description is minimal and lacks required sections from the template (Description, Test Coverage, PR Checklist details). While it mentions context from prior PR #9344 and reviewer feedback, it does not provide comprehensive explanation of what was changed or clear test coverage documentation. Expand the description to include: detailed explanation of the A_log fusion implementation, specific test coverage details (DummyMambaALogModule, test cases for fused parameter creation and memory), and confirmation of PR checklist items.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][feat] AutoDeploy: Add A_log fusion for Mamba layers' clearly and specifically describes the main change in the PR, following the required template format.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (1)

69-72: Incomplete check for exp node removal.

The assertion only checks for torch.exp and torch.ops.aten.exp.default targets, but the transform also handles method-style .exp() calls. Consider extending the check to cover all patterns.

     assert not any(
-        node.target in {torch.exp, torch.ops.aten.exp.default}
+        node.target in {torch.exp, torch.ops.aten.exp.default}
+        or (node.op == "call_method" and node.target == "exp")
         for node in gm_transformed.graph.nodes
     ), "exp node should be removed after fusion."
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a02950 and f81b19d.

📒 Files selected for processing (3)
  • examples/auto_deploy/nano_v3.yaml (1 hunks)
  • tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py (1 hunks)
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

  • tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

  • tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (2)
tensorrt_llm/_torch/auto_deploy/export/export.py (1)
  • torch_export_to_gm (276-344)
tensorrt_llm/_torch/auto_deploy/transform/optimizer.py (1)
  • InferenceOptimizer (23-78)
🪛 Ruff (0.14.5)
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py

64-64: Consider moving this statement to an else block

(TRY300)


169-169: Unused method argument: cm

(ARG002)


170-170: Unused method argument: factory

(ARG002)


171-171: Unused method argument: shared_config

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py (3)

74-75: Verify float32 conversion is intentional for precision.

The .float() call converts a_log to float32 before computing -exp(), regardless of the original dtype. This appears intentional to match the original model's computation A = -torch.exp(self.A_log.float()), but confirm this is the desired behavior for all use cases (e.g., when models use bfloat16 or float16 for A_log).


120-153: LGTM!

Comprehensive pattern registration covering all combinations of exp and neg operations (function calls, ATen ops, and methods). The approach cleanly handles the various ways these operations can appear in the traced graph.


156-184: LGTM!

Well-structured transform class with clear docstring explaining the optimization. The implementation correctly applies pattern matching, performs dead code elimination only when needed, and returns appropriate metadata. The unused method arguments (cm, factory, shared_config) are interface requirements from BaseTransform.

examples/auto_deploy/nano_v3.yaml (1)

24-26: LGTM!

Configuration entry correctly registers the new fuse_mamba_a_log transform for the Nano V3 deployment, following the established pattern of other transforms in this file.

tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (4)

10-23: LGTM!

Clean test fixture that accurately simulates the A = -torch.exp(self.A_log.float()) pattern being fused. The module correctly exercises the transform logic.


75-110: LGTM!

Good memory verification test with appropriate tolerance for allocator variance. The test correctly verifies that the fusion adds approximately one parameter's worth of memory (the new A_fused tensor). Note that memory tests can occasionally be flaky in CI environments due to allocator behavior; if this becomes an issue, consider increasing the tolerance or marking as a known flaky test.


1-8: Missing NVIDIA copyright header.

As per coding guidelines, all TensorRT-LLM code files should contain an NVIDIA copyright header.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import torch
⛔ Skipped due to learnings
Learnt from: galagam
Repo: NVIDIA/TensorRT-LLM PR: 6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Learnt from: xinhe-nv
Repo: NVIDIA/TensorRT-LLM PR: 8534
File: scripts/format_test_list.py:1-6
Timestamp: 2025-10-22T06:53:47.017Z
Learning: The file `scripts/format_test_list.py` in the TensorRT-LLM repository does not require the NVIDIA Apache-2.0 copyright header.
Learnt from: CR
Repo: NVIDIA/TensorRT-LLM PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:09:17.870Z
Learning: Applies to **/*.{cpp,h,cu,py} : All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

26-34: None arguments follow the established test pattern—no changes needed.

The verification confirms that InferenceOptimizer(None, config)(None, gm) is the standard pattern used consistently across transform tests (e.g., test_attention_matcher.py lines 421, 438, 458). Passing None for the factory argument is the intended usage for tests where the transforms don't require these arguments. This is not a fragility issue but rather the documented convention through consistent implementation across the test suite.

@nvchenghaoz nvchenghaoz self-assigned this Nov 25, 2025
@suyoggupta suyoggupta moved this from Backlog to In review in AutoDeploy Board Nov 25, 2025
@nvchenghaoz nvchenghaoz changed the title [None][Feat] AutoDeploy: Add A_log fusion for Mamba layers [None][feat] AutoDeploy: Add A_log fusion for Mamba layers Nov 25, 2025
@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25782 [ run ] triggered by Bot. Commit: 42c66b1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25782 [ run ] completed with state SUCCESS. Commit: 42c66b1
/LLM/main/L0_MergeRequest_PR pipeline #19556 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25824 [ run ] triggered by Bot. Commit: 42c66b1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25824 [ run ] completed with state FAILURE. Commit: 42c66b1
/LLM/main/L0_MergeRequest_PR pipeline #19586 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25868 [ run ] triggered by Bot. Commit: 42c66b1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25868 [ run ] completed with state SUCCESS. Commit: 42c66b1
/LLM/main/L0_MergeRequest_PR pipeline #19614 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@nvchenghaoz nvchenghaoz merged commit 18fbda5 into NVIDIA:main Nov 26, 2025
6 of 7 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in AutoDeploy Board Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants