[TRTLLM-6756][feat] Add Beam Search to TorchSampler #8509

stnie · 2025-10-20T16:31:31Z

Summary by CodeRabbit

Release Notes

New Features
- Added beam search support for improved text generation quality with multiple hypotheses exploration.
- Introduced configuration option to disable overlap scheduler for performance tuning.
- Enhanced sampling strategy system with expanded metadata support and multi-beam handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

Added Beam search to Torch Sampler
Performance needs to be improved
Streaming and logprobs are supported
Early finished beams are supported
Top-K logprobs are not supported
Beam search is not fully async yet

Test Coverage

test_beam_search.py is updated to test the newly introduced features and end to end tests
test time for test_beam_search.py 1min05sec

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

stnie · 2025-10-20T16:32:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-10-20T16:37:45Z

PR_Github #21928 [ run ] triggered by Bot. Commit: 80f31f7

tensorrt-cicd · 2025-10-20T20:11:22Z

PR_Github #21928 [ run ] completed with state SUCCESS. Commit: 80f31f7
/LLM/main/L0_MergeRequest_PR pipeline #16530 completed with status: 'FAILURE'

stnie · 2025-10-21T10:03:23Z

/bot run

tensorrt-cicd · 2025-10-21T10:09:02Z

PR_Github #22046 [ run ] triggered by Bot. Commit: 990321c

tensorrt_llm/_torch/pyexecutor/_util.py

tensorrt_llm/_torch/pyexecutor/sampler.py

tensorrt-cicd · 2025-10-21T10:47:39Z

PR_Github #22046 [ run ] completed with state FAILURE. Commit: 990321c
/LLM/main/L0_MergeRequest_PR pipeline #16622 completed with status: 'FAILURE'

tensorrt_llm/_torch/pyexecutor/sampler.py

ixlmar

Mostly nit, stronger feelings only about where it touches the batched sampling, also in anticipation of future work (currently mainly #8581).

tensorrt_llm/_torch/pyexecutor/sampler.py

tensorrt_llm/_torch/pyexecutor/sampling_utils.py

stnie · 2025-10-28T15:00:18Z

/bot run

tensorrt-cicd · 2025-10-28T15:06:14Z

PR_Github #22787 [ run ] triggered by Bot. Commit: 26fac04

tensorrt-cicd · 2025-10-28T16:17:06Z

PR_Github #22787 [ run ] completed with state SUCCESS. Commit: 26fac04
/LLM/main/L0_MergeRequest_PR pipeline #17183 completed with status: 'FAILURE'

stnie · 2025-10-30T12:05:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-10-30T12:10:39Z

PR_Github #23043 [ run ] triggered by Bot. Commit: 9be3a85

tensorrt-cicd · 2025-10-30T15:52:05Z

PR_Github #23043 [ run ] completed with state SUCCESS. Commit: 9be3a85
/LLM/main/L0_MergeRequest_PR pipeline #17375 completed with status: 'FAILURE'

stnie · 2025-11-11T17:31:54Z

/bot run --disable-fail-fast

stnie · 2025-11-11T18:34:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-11T18:39:45Z

PR_Github #24210 [ run ] triggered by Bot. Commit: f7c63c3

stnie · 2025-11-25T08:14:18Z

/bot run

tensorrt-cicd · 2025-11-25T08:20:34Z

PR_Github #25701 [ run ] triggered by Bot. Commit: 7019a37

stnie · 2025-11-25T15:40:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-25T15:46:16Z

PR_Github #25759 [ run ] triggered by Bot. Commit: 6b4f97e

tensorrt-cicd · 2025-11-25T15:46:18Z

PR_Github #25701 [ run ] completed with state ABORTED. Commit: 7019a37
LLM/main/L0_MergeRequest_PR #19483 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-11-25T19:31:56Z

PR_Github #25759 [ run ] completed with state SUCCESS. Commit: 6b4f97e
/LLM/main/L0_MergeRequest_PR pipeline #19534 completed with status: 'FAILURE'

…ing. - Added BeamSearchArgs class and updated methods to handle beam search logic, including cache indirection updates and beam score management. - Modified create_torch_sampler_args to include use_overlap_scheduler parameter. - Updated sampling strategy to accommodate beam search requests. Signed-off-by: Stefan Niebler <[email protected]>

Signed-off-by: Stefan Niebler <[email protected]>

Unsqueeze buffer returned from sampling to always contain the beam_width dimension Signed-off-by: Stefan Niebler <[email protected]>

…features - Add metadata object to grouped_request to pass additional data, which is not part of the SamplingStrategy definition. - Add several buffer to TorchSampler Store for beam search features, which are only allocated when beam search is used - Add support for beam search with streaming enabled - Beam search no longer requires all beams to finish at the same iteration. - gather_generation_logits can now be used together with beam search. - Logprob generation is now possible with beam search enabled. Top-k logprobs is not supported - Updated test_beam_search.py to also cover TorchSampler - General changes for formatting and readability Signed-off-by: Stefan Niebler <[email protected]>

- Updated create_torch_sampler_args and related methods to replace use_overlap_scheduler with disable_overlap_scheduler. - Added missing disable_overlap_scheduler parameter to TorchSampler.Args in ad_executor.py - Introduced BeamHistory data class to encapsulate beam search history, including tokens and logprobs. - Refactored methods to create and finalize beam history, improving clarity and functionality. Signed-off-by: Stefan Niebler <[email protected]>

…onality - Introduced a new test file `test_beam_search_util.py` containing a dummy model and utility functions for beam search testing. - Refactored existing tests in `test_beam_search.py` to utilize the new utility functions, improving test organization and clarity. - Added comprehensive tests for beam search sampling, including validation of output shapes, cache indirection, and cumulative log probabilities. - Added new unit tests for beam search sampling, updating of beams and finalization of requests Signed-off-by: Stefan Niebler <[email protected]>

- Updated beam search sampling logic to ensure asynchronous handling of finished beams and cache indirection Signed-off-by: Stefan Niebler <[email protected]>

…e_async - Added new buffer to handle Beam Search sampling asynchronously, where possible - split beam history creation and beam finalization. Beam finalization occurs in update_requests - adjusted write_finish_reason to support beam search Signed-off-by: Stefan Niebler <[email protected]>

… the updated beam search in TorchSampler - enhanced logprob testing to verify sum(logprobs) == cum_log_probs - added testing for stop tokens Signed-off-by: Stefan Niebler <[email protected]>

Signed-off-by: Stefan Niebler <[email protected]>

…h sampling - Fixed several bugs, which caused non-beam search testcases to fail - Adjusted test_torch_sampler.py to conform with changes in the TorchSampler - Improved Beam search sampling with async torch operations Signed-off-by: Stefan Niebler <[email protected]>

Signed-off-by: Stefan Niebler <[email protected]>

- Converted RequestGroupKey from NamedTuple to a dataclass with frozen and kw_only attributes. - Added __iter__ and __len__ methods for improved usability. Signed-off-by: Stefan Niebler <[email protected]>

…Sampler - Modified the _handle_finish_reasons method to accept an additional parameter, finish_reasons_list, for correct handling of finish reasons in beam search. - Updated calls to _handle_finish_reasons throughout the TorchSampler class to accommodate the new parameter. Signed-off-by: Stefan Niebler <[email protected]>

… method and fix bugs - Introduced setup_sampler_step method to enable the setup process for disaggregated serving in beam search. - Updated cache indirection initialization to use torch.zeros to prevent reading invalid values from cache_indirection - Updated mtpSampler to correctly call TorchSampler functions - Fixed handle_finish_reasons by wrapping finish reasons in the FinishReason class. - Adjusted max_lengths_tensor calculation to account for original prompt length. Signed-off-by: Stefan Niebler <[email protected]>

…and sampling utilities - Introduced new functions to retrieve beam width parameters for input and output, improving clarity and modularity. - Updated UtilsSamplingParams to include separate beam width parameters and a flag for beam search usage. - Refactored beam search sampling logic to accommodate changes in beam width handling, ensuring compatibility with new parameters. - Unified beam search sampling for context and generation requests - Simplified code for beam history creation - Adjusted test cases to reflect changes in beam width handling and improved logprob validation. Signed-off-by: Stefan Niebler <[email protected]>

… merge - Modified `model_kwargs` to include `sampler_type` for improved test configuration. - Adjusted `llm_cuda_graph` fixture to remove unnecessary `sampler_type` parameter. - Enhanced clarity in `test_torch_sampler.py` by adding comments regarding the `is_context_init_state` attribute. Signed-off-by: Stefan Niebler <[email protected]>

stnie · 2025-11-26T09:03:23Z

/bot run

stnie requested review from Funatiq, dcampora and ixlmar October 20, 2025 16:31

stnie force-pushed the develop/sampler/beam_search_torch branch from 80f31f7 to 990321c Compare October 21, 2025 10:03

Funatiq reviewed Oct 21, 2025

View reviewed changes

ixlmar reviewed Oct 21, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/sampler.py Show resolved Hide resolved

ixlmar reviewed Oct 24, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/sampling_utils.py Outdated Show resolved Hide resolved

stnie force-pushed the develop/sampler/beam_search_torch branch 3 times, most recently from 1c7b4be to 26fac04 Compare October 28, 2025 14:59

stnie force-pushed the develop/sampler/beam_search_torch branch 2 times, most recently from f754360 to 9be3a85 Compare October 30, 2025 12:04

stnie force-pushed the develop/sampler/beam_search_torch branch from 9be3a85 to eb265ac Compare November 11, 2025 17:30

stnie force-pushed the develop/sampler/beam_search_torch branch from eb265ac to f7c63c3 Compare November 11, 2025 18:33

Funatiq approved these changes Nov 25, 2025

View reviewed changes

mikeiovine approved these changes Nov 25, 2025

View reviewed changes

stnie force-pushed the develop/sampler/beam_search_torch branch 2 times, most recently from 8dfa36d to 6b4f97e Compare November 25, 2025 15:40

stnie force-pushed the develop/sampler/beam_search_torch branch from 6b4f97e to 326305d Compare November 26, 2025 09:02

stnie added 17 commits November 26, 2025 10:03

[TRTLLM-6756][chore] Update Beam Search after code refactoring.

a6f7190

Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][chore] Fixed wrong shape, when not using beam search

c68f5de

Unsqueeze buffer returned from sampling to always contain the beam_width dimension Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][feat] Make beam search sampling asynchronous

62a8751

- Updated beam search sampling logic to ensure asynchronous handling of finished beams and cache indirection Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][test] Updated test_beam_search.py to correctly test for…

09724f9

… the updated beam search in TorchSampler - enhanced logprob testing to verify sum(logprobs) == cum_log_probs - added testing for stop tokens Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][chore] Fixed formatting after merge

ddecfa3

Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][chore] Update formatting after rebase + merge

f77db77

Signed-off-by: Stefan Niebler <[email protected]>

[TRTLLM-6756][refactor] Refactor RequestGroupKey to use dataclass

cba7ab0

- Converted RequestGroupKey from NamedTuple to a dataclass with frozen and kw_only attributes. - Added __iter__ and __len__ methods for improved usability. Signed-off-by: Stefan Niebler <[email protected]>

stnie force-pushed the develop/sampler/beam_search_torch branch from 326305d to 224da51 Compare November 26, 2025 09:03

[TRTLLM-6756][feat] Add Beam Search to TorchSampler #8509

Are you sure you want to change the base?

[TRTLLM-6756][feat] Add Beam Search to TorchSampler #8509

Conversation

stnie commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

stnie commented Oct 20, 2025

Uh oh!

tensorrt-cicd commented Oct 20, 2025

Uh oh!

tensorrt-cicd commented Oct 20, 2025

Uh oh!

stnie commented Oct 21, 2025

Uh oh!

tensorrt-cicd commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 21, 2025

Uh oh!

Uh oh!

ixlmar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stnie commented Oct 28, 2025

Uh oh!

tensorrt-cicd commented Oct 28, 2025

Uh oh!

tensorrt-cicd commented Oct 28, 2025

Uh oh!

stnie commented Oct 30, 2025

Uh oh!

tensorrt-cicd commented Oct 30, 2025

Uh oh!

tensorrt-cicd commented Oct 30, 2025

Uh oh!

stnie commented Nov 11, 2025

Uh oh!

stnie commented Nov 11, 2025

Uh oh!

tensorrt-cicd commented Nov 11, 2025

Uh oh!

stnie commented Nov 25, 2025

Uh oh!

tensorrt-cicd commented Nov 25, 2025

Uh oh!

stnie commented Nov 25, 2025

Uh oh!

tensorrt-cicd commented Nov 25, 2025

Uh oh!

tensorrt-cicd commented Nov 25, 2025

Uh oh!

stnie commented Oct 20, 2025 •

edited

Loading