-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None][feat] AutoDeploy: Add A_log fusion for Mamba layers #9422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chenghao Zhang <[email protected]>
📝 WalkthroughWalkthroughThe PR introduces a new FuseMambaALog transform that optimizes Mamba models by fusing A_log parameter usage into a precomputed A_fused parameter, eliminating redundant neg-exp computations. Changes include the transform implementation, its configuration registration, and tests validating fusion and memory improvements. Changes
Sequence DiagramsequenceDiagram
participant User
participant FuseMambaALog
participant GraphModule
participant PatternMatcher
participant GraphRewriter
User->>FuseMambaALog: _apply(GraphModule, ...)
FuseMambaALog->>PatternMatcher: Register neg(exp(...)) patterns
FuseMambaALog->>GraphModule: Scan for pattern matches
GraphModule-->>PatternMatcher: Returns matched nodes
alt Matches Found
FuseMambaALog->>GraphModule: Extract A_log attribute
FuseMambaALog->>GraphModule: Compute -exp(A_log) → A_fused
FuseMambaALog->>GraphRewriter: Create/reuse A_fused parameter
GraphRewriter->>GraphModule: Replace neg(exp) with A_fused reference
FuseMambaALog->>GraphModule: Remove dead code
else No Matches
FuseMambaALog-->>User: Return unchanged GraphModule
end
FuseMambaALog-->>User: Return TransformInfo with statistics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (1)
69-72: Incomplete check forexpnode removal.The assertion only checks for
torch.expandtorch.ops.aten.exp.defaulttargets, but the transform also handles method-style.exp()calls. Consider extending the check to cover all patterns.assert not any( - node.target in {torch.exp, torch.ops.aten.exp.default} + node.target in {torch.exp, torch.ops.aten.exp.default} + or (node.op == "call_method" and node.target == "exp") for node in gm_transformed.graph.nodes ), "exp node should be removed after fusion."
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/auto_deploy/nano_v3.yaml(1 hunks)tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py(1 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., usefrom package.subpackage import fooand thenfoo.SomeClass()instead offrom package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g.,some_file.py)
Python class names should use PascalCase (e.g.,class SomeClass)
Python function and method names should use snake_case (e.g.,def my_awesome_function():)
Python local variable names should use snake_case, with prefixkfor variable names that start with a number (e.g.,k_99th_percentile = ...)
Python global variables should use upper snake_case with prefixG(e.g.,G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g.,MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g.,self.x = 5followed by"""<type>: Description of 'x'""")
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic
Files:
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.pytests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top
Files:
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.pytests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py
🧬 Code graph analysis (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (2)
tensorrt_llm/_torch/auto_deploy/export/export.py (1)
torch_export_to_gm(276-344)tensorrt_llm/_torch/auto_deploy/transform/optimizer.py (1)
InferenceOptimizer(23-78)
🪛 Ruff (0.14.5)
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py
64-64: Consider moving this statement to an else block
(TRY300)
169-169: Unused method argument: cm
(ARG002)
170-170: Unused method argument: factory
(ARG002)
171-171: Unused method argument: shared_config
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_mamba_a_log.py (3)
74-75: Verify float32 conversion is intentional for precision.The
.float()call convertsa_logto float32 before computing-exp(), regardless of the original dtype. This appears intentional to match the original model's computationA = -torch.exp(self.A_log.float()), but confirm this is the desired behavior for all use cases (e.g., when models use bfloat16 or float16 forA_log).
120-153: LGTM!Comprehensive pattern registration covering all combinations of
expandnegoperations (function calls, ATen ops, and methods). The approach cleanly handles the various ways these operations can appear in the traced graph.
156-184: LGTM!Well-structured transform class with clear docstring explaining the optimization. The implementation correctly applies pattern matching, performs dead code elimination only when needed, and returns appropriate metadata. The unused method arguments (
cm,factory,shared_config) are interface requirements fromBaseTransform.examples/auto_deploy/nano_v3.yaml (1)
24-26: LGTM!Configuration entry correctly registers the new
fuse_mamba_a_logtransform for the Nano V3 deployment, following the established pattern of other transforms in this file.tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_mamba_a_log.py (4)
10-23: LGTM!Clean test fixture that accurately simulates the
A = -torch.exp(self.A_log.float())pattern being fused. The module correctly exercises the transform logic.
75-110: LGTM!Good memory verification test with appropriate tolerance for allocator variance. The test correctly verifies that the fusion adds approximately one parameter's worth of memory (the new
A_fusedtensor). Note that memory tests can occasionally be flaky in CI environments due to allocator behavior; if this becomes an issue, consider increasing the tolerance or marking as a known flaky test.
1-8: Missing NVIDIA copyright header.As per coding guidelines, all TensorRT-LLM code files should contain an NVIDIA copyright header.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import torch⛔ Skipped due to learnings
Learnt from: galagam Repo: NVIDIA/TensorRT-LLM PR: 6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.Learnt from: xinhe-nv Repo: NVIDIA/TensorRT-LLM PR: 8534 File: scripts/format_test_list.py:1-6 Timestamp: 2025-10-22T06:53:47.017Z Learning: The file `scripts/format_test_list.py` in the TensorRT-LLM repository does not require the NVIDIA Apache-2.0 copyright header.Learnt from: CR Repo: NVIDIA/TensorRT-LLM PR: 0 File: CODING_GUIDELINES.md:0-0 Timestamp: 2025-11-24T17:09:17.870Z Learning: Applies to **/*.{cpp,h,cu,py} : All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the topLearnt from: moraxu Repo: NVIDIA/TensorRT-LLM PR: 6303 File: tests/integration/test_lists/qa/examples_test_list.txt:494-494 Timestamp: 2025-07-28T17:06:08.621Z Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
26-34: None arguments follow the established test pattern—no changes needed.The verification confirms that
InferenceOptimizer(None, config)(None, gm)is the standard pattern used consistently across transform tests (e.g., test_attention_matcher.py lines 421, 438, 458). PassingNonefor the factory argument is the intended usage for tests where the transforms don't require these arguments. This is not a fragility issue but rather the documented convention through consistent implementation across the test suite.
Signed-off-by: Chenghao Zhang <[email protected]>
|
/bot run |
|
PR_Github #25782 [ run ] triggered by Bot. Commit: |
|
PR_Github #25782 [ run ] completed with state |
|
/bot run |
|
PR_Github #25824 [ run ] triggered by Bot. Commit: |
|
PR_Github #25824 [ run ] completed with state |
|
/bot run |
|
PR_Github #25868 [ run ] triggered by Bot. Commit: |
|
PR_Github #25868 [ run ] completed with state |
Based on the previous PR - #9344, Split the previous PR into two parts, a_log fusion and the mamba perf optimization.
This PR is focus on the a_log fusion.
Addressed reviewers' comments.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.