Skip to content

Conversation

@tomeras91
Copy link
Collaborator

The Mamba2Mixer block forward pass in Nemotron-H assumes that state_indices in MambaCacheManager is valid, which is prepared during the call to MambaHybridCacheManager.prepare_resources(). However, warmup runs don't call prepare_resources(), requiring special handling within the Mamba2Mixer forward pass. Previously, warmup runs were assumed to have a single request with request_id=0 and were identified by this condition.

PR #4466 changed the behavior of warmup runs, so they no longer always have just a single request with id=0. Consequently, this broke the Mamba2Mixer forward pass during warmup runs, preventing Nemotron-H from being initialized via the LLM API.

This PR enhances the stability of Mamba2Mixer during warmup runs by directly checking the validity of state_indices. If state_indices is invalid, it fills it with valid dummy values, reducing the differences between regular and warmup forward passes.

Additionally, since PR #4466 broke Nemotron-H silently due to the absence of unittests using the LLM API for Nemotron-H, this PR also introduces such a test to prevent similar issues in the future.

tomeras91 added 4 commits June 5, 2025 11:13
…failed or succeeded (2) don't add BOS token to match expected outputs

Signed-off-by: Tomer Asida <[email protected]>
…y state_indices during forward pass. Now LLM API test passes

Signed-off-by: Tomer Asida <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request increases the robustness of the Mamba2Mixer forward pass during warmup runs by checking the validity of state indices and supplying dummy values if necessary. It also adds new LLM API unit tests for Nemotron-H to catch similar issues in the future.

  • Updated unit tests to use function-based test definitions and to cover LLM API usage.
  • Modified state_indices initialization in MambaHybridCacheManager to properly set the device and dtype.
  • Refactored Mamba2Mixer to determine warmup cases by checking if state_indices is empty and to generate dummy indices accordingly.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py Refactored tests to adopt function-based style and updated KvCacheConfig usage.
tensorrt_llm/_torch/pyexecutor/resource_manager.py Updated the initialization of state_indices with explicit device and dtype.
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py Revised warmup request handling by checking state_indices emptiness and creating fallback indices.
Comments suppressed due to low confidence (3)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py:162

  • Ensure that attn_metadata.kv_cache_manager is always non-null in warmup runs, as the previous check for None was removed; consider adding validation if there's a chance it might be None.
state_indices = attn_metadata.kv_cache_manager.get_state_indices()

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py:278

  • [nitpick] Add a comment explaining why ssm_states is updated unconditionally during warmup runs to aid maintainers in understanding the relaxed behavior in this code path.
ssm_states[indices] = current_ssm_states

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py:214

  • [nitpick] Consider clearly distinguishing between KvCacheConfig and KvCacheConfigCpp in naming and usage to reduce potential confusion in the test configuration.
kv_cache_config = KvCacheConfigCpp(max_tokens=num_blocks * tokens_per_block,

@tomeras91 tomeras91 requested a review from suyoggupta June 5, 2025 13:04
@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7757 [ run ] triggered by Bot

@tomeras91 tomeras91 changed the title [fix] Increase Nemotron-H warmup request robustness [nvbug 5325284][fix] Increase Nemotron-H warmup request robustness Jun 5, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #7757 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5620 completed with status: 'FAILURE'

Copy link
Collaborator

@vegaluisjose vegaluisjose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @tomeras91 !

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8027 [ run ] triggered by Bot

@tomeras91
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8028 [ kill ] triggered by Bot

@tomeras91
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8028 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit a19bf8c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8031 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8031 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5819 completed with status: 'FAILURE'

@tomeras91
Copy link
Collaborator Author

/bot run

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8097 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8097 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5873 completed with status: 'FAILURE'

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8125 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8125 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5891 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8146 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8146 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5903 completed with status: 'FAILURE'

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8160 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8160 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5912 completed with status: 'FAILURE'

@tomeras91
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8208 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8208 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5953 completed with status: 'SUCCESS'

@netanel-haber netanel-haber merged commit f121f13 into NVIDIA:main Jun 10, 2025
3 checks passed
@tomeras91 tomeras91 deleted the fix-nemotron-h-warmup branch June 10, 2025 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants