[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness #4954

tomeras91 · 2025-06-05T12:19:31Z

The Mamba2Mixer block forward pass in Nemotron-H assumes that state_indices in MambaCacheManager is valid, which is prepared during the call to MambaHybridCacheManager.prepare_resources(). However, warmup runs don't call prepare_resources(), requiring special handling within the Mamba2Mixer forward pass. Previously, warmup runs were assumed to have a single request with request_id=0 and were identified by this condition.

PR #4466 changed the behavior of warmup runs, so they no longer always have just a single request with id=0. Consequently, this broke the Mamba2Mixer forward pass during warmup runs, preventing Nemotron-H from being initialized via the LLM API.

This PR enhances the stability of Mamba2Mixer during warmup runs by directly checking the validity of state_indices. If state_indices is invalid, it fills it with valid dummy values, reducing the differences between regular and warmup forward passes.

Additionally, since PR #4466 broke Nemotron-H silently due to the absence of unittests using the LLM API for Nemotron-H, this PR also introduces such a test to prevent similar issues in the future.

… fails Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…failed or succeeded (2) don't add BOS token to match expected outputs Signed-off-by: Tomer Asida <[email protected]>

…y state_indices during forward pass. Now LLM API test passes Signed-off-by: Tomer Asida <[email protected]>

Copilot

Pull Request Overview

This pull request increases the robustness of the Mamba2Mixer forward pass during warmup runs by checking the validity of state indices and supplying dummy values if necessary. It also adds new LLM API unit tests for Nemotron-H to catch similar issues in the future.

Updated unit tests to use function-based test definitions and to cover LLM API usage.
Modified state_indices initialization in MambaHybridCacheManager to properly set the device and dtype.
Refactored Mamba2Mixer to determine warmup cases by checking if state_indices is empty and to generate dummy indices accordingly.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py	Refactored tests to adopt function-based style and updated KvCacheConfig usage.
tensorrt_llm/_torch/pyexecutor/resource_manager.py	Updated the initialization of state_indices with explicit device and dtype.
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py	Revised warmup request handling by checking state_indices emptiness and creating fallback indices.

Comments suppressed due to low confidence (3)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py:162

Ensure that attn_metadata.kv_cache_manager is always non-null in warmup runs, as the previous check for None was removed; consider adding validation if there's a chance it might be None.

state_indices = attn_metadata.kv_cache_manager.get_state_indices()

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py:278

[nitpick] Add a comment explaining why ssm_states is updated unconditionally during warmup runs to aid maintainers in understanding the relaxed behavior in this code path.

ssm_states[indices] = current_ssm_states

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py:214

[nitpick] Consider clearly distinguishing between KvCacheConfig and KvCacheConfigCpp in naming and usage to reduce potential confusion in the test configuration.

kv_cache_config = KvCacheConfigCpp(max_tokens=num_blocks * tokens_per_block,

Signed-off-by: Tomer Asida <[email protected]>

…-LLM into fix-nemotron-h-warmup Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-05T14:01:23Z

/bot run

tensorrt-cicd · 2025-06-05T14:09:45Z

PR_Github #7757 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T15:50:00Z

PR_Github #7757 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5620 completed with status: 'FAILURE'

vegaluisjose

Great work @tomeras91 !

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-08T15:01:20Z

/bot run

tensorrt-cicd · 2025-06-08T15:07:03Z

PR_Github #8027 [ run ] triggered by Bot

tomeras91 · 2025-06-08T15:07:46Z

/bot kill

tensorrt-cicd · 2025-06-08T15:13:33Z

PR_Github #8028 [ kill ] triggered by Bot

tomeras91 · 2025-06-08T15:14:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-08T15:14:15Z

PR_Github #8028 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit a19bf8c

tensorrt-cicd · 2025-06-08T15:20:26Z

PR_Github #8031 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-08T18:05:51Z

PR_Github #8031 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5819 completed with status: 'FAILURE'

tomeras91 · 2025-06-08T20:55:44Z

/bot run

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-09T06:22:28Z

/bot run

tensorrt-cicd · 2025-06-09T06:28:10Z

PR_Github #8097 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-09T10:33:56Z

PR_Github #8097 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5873 completed with status: 'FAILURE'

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-09T11:24:10Z

/bot run

tensorrt-cicd · 2025-06-09T11:30:14Z

PR_Github #8125 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-09T16:05:40Z

PR_Github #8125 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5891 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-09T16:33:26Z

/bot run

tensorrt-cicd · 2025-06-09T16:39:14Z

PR_Github #8146 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-09T18:55:13Z

PR_Github #8146 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5903 completed with status: 'FAILURE'

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 · 2025-06-09T21:37:25Z

/bot run

tensorrt-cicd · 2025-06-09T21:43:12Z

PR_Github #8160 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T00:41:07Z

PR_Github #8160 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5912 completed with status: 'FAILURE'

tomeras91 · 2025-06-10T04:59:32Z

/bot run

tensorrt-cicd · 2025-06-10T05:04:53Z

PR_Github #8208 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-10T08:05:50Z

PR_Github #8208 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5953 completed with status: 'SUCCESS'

tomeras91 added 4 commits June 5, 2025 11:13

Add unittest for Nemotron-H using the pytroch LLM API, that currently…

c70cd1a

… fails Signed-off-by: Tomer Asida <[email protected]>

use pytest instead of unittest in Nemotron-H correctness test

1e5b107

Signed-off-by: Tomer Asida <[email protected]>

Fix Nemotron-H LLM API test - (1) call shutdown() regardless if test …

b650a85

…failed or succeeded (2) don't add BOS token to match expected outputs Signed-off-by: Tomer Asida <[email protected]>

Deal with warmup requests better in Mamba2Mixer forward - assign dumm…

966f3a7

…y state_indices during forward pass. Now LLM API test passes Signed-off-by: Tomer Asida <[email protected]>

tomeras91 requested review from a team as code owners June 5, 2025 12:19

tomeras91 requested review from Copilot, juney-nvidia, schetlur-nv, vegaluisjose and yuxianq June 5, 2025 12:19

Copilot AI reviewed Jun 5, 2025

View reviewed changes

tomeras91 requested a review from suyoggupta June 5, 2025 13:04

tomeras91 added 3 commits June 5, 2025 16:33

Merge branch 'NVIDIA:main' into fix-nemotron-h-warmup

02d13c7

Merge branch 'main' into fix-nemotron-h-warmup

61c3db8

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'fix-nemotron-h-warmup' of github.com:tomeras91/TensorRT…

cb80c01

…-LLM into fix-nemotron-h-warmup Signed-off-by: Tomer Asida <[email protected]>

tomeras91 changed the title ~~[fix] Increase Nemotron-H warmup request robustness~~ [nvbug 5325284][fix] Increase Nemotron-H warmup request robustness Jun 5, 2025

vegaluisjose approved these changes Jun 5, 2025

View reviewed changes

schetlur-nv approved these changes Jun 5, 2025

View reviewed changes

tomeras91 added 2 commits June 8, 2025 09:09

Merge branch 'main' into fix-nemotron-h-warmup

8d3d4a9

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

a19bf8c

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

21e78c7

Signed-off-by: Tomer Asida <[email protected]>

clear memory between tests to avoid OOM on A30

02f0817

Signed-off-by: Tomer Asida <[email protected]>

suyoggupta approved these changes Jun 9, 2025

View reviewed changes

Merge branch 'main' into fix-nemotron-h-warmup

b3bc5e1

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into fix-nemotron-h-warmup

ede641f

Signed-off-by: Tomer Asida <[email protected]>

netanel-haber merged commit f121f13 into NVIDIA:main Jun 10, 2025
3 checks passed

tomeras91 deleted the fix-nemotron-h-warmup branch June 10, 2025 08:17

[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness #4954

[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness #4954

Uh oh!

Conversation

tomeras91 commented Jun 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

tomeras91 commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

vegaluisjose left a comment

Choose a reason for hiding this comment

Uh oh!

tomeras91 commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tomeras91 commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tomeras91 commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tensorrt-cicd commented Jun 8, 2025

Uh oh!

tomeras91 commented Jun 8, 2025

Uh oh!

tomeras91 commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tomeras91 commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tomeras91 commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tomeras91 commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 9, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tomeras91 commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

tensorrt-cicd commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants