[https://nvbugs/5508301][feat] Move D->H copies to a worker thread whe… #8463

dhansen-nvidia · 2025-10-17T18:43:55Z

…n confidential compute is active

Summary by CodeRabbit

New Features
- Added support for asynchronous host-to-device data transfers during sampling when operating in confidential compute environments.
- Implemented automatic detection of confidential compute capability for conditional optimization enablement.
Bug Fixes
- Added cleanup mechanism to properly shut down host-copy threading during executor shutdown.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-10-17T18:48:11Z

📝 Walkthrough

Walkthrough

This pull request introduces optional asynchronous host-copy threading for GPU sampling. When confidential compute is detected, a use_host_copy_thread flag is computed and propagated through the executor creation pipeline. Sampler classes are enhanced to support Future-wrapped host tensors and include lifecycle methods to manage a dedicated thread for non-blocking host-device transfers.

Changes

Cohort / File(s)	Summary
Confidential compute detection `tensorrt_llm/_utils.py`	Added public `confidential_compute_enabled()` function that queries NVML to determine if Confidential Compute is enabled.
Executor wiring `tensorrt_llm/_torch/pyexecutor/_util.py`	Imports `confidential_compute_enabled`, adds `use_host_copy_thread` parameter to `TorchSampler.Args` and `create_torch_sampler_args()`, and updates `instantiate_sampler()` and `create_py_executor_instance()` to compute and propagate the flag.
Sampler threading and state `tensorrt_llm/_torch/pyexecutor/sampler.py`	Introduces threaded host-copy capability with Future-wrapped host tensors. Updated `SampleState` and `SampleStateTRTLLM` to support `Future[...]` host types via internal `_host` fields and property accessors. Added `use_host_copy_thread` parameter to `TorchSampler.Args` and `TRTLLMSampler.Args`. Implemented lifecycle methods (`start_host_copy_thread()`, `stop_host_copy_thread()`, `host_copy_thread_active()`) and helper methods for async transfers.
Executor lifecycle `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Added cleanup in shutdown to call `stop_host_copy_thread()` on the sampler if available.
Executor creation `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`	Added `maybe_start_sampler_host_copy_thread()` helper function and invokes it after torch memory fraction adjustment and after final py_executor construction to start the sampler's host copy thread.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Creator as py_executor_creator
    participant Util as _util
    participant Executor as py_executor
    participant Sampler as sampler

    User->>Creator: create_py_executor_instance(...)
    Creator->>Util: create_py_executor_instance(...)
    Util->>Util: confidential_compute_enabled()
    Util->>Util: instantiate_sampler(..., use_host_copy_thread)
    Util->>Sampler: TorchSampler(..., use_host_copy_thread=True)
    Sampler-->>Util: sampler instance
    Util-->>Creator: py_executor with sampler
    Creator->>Creator: maybe_start_sampler_host_copy_thread(sampler)
    Creator->>Sampler: start_host_copy_thread()
    Sampler-->>Sampler: spawn async host-copy thread
    Creator->>Executor: start_worker()
    
    Note over Creator,Sampler: During sampling phase
    
    Executor->>Sampler: sample_async()
    Sampler->>Sampler: launch async host transfer
    Sampler-->>Executor: Future[SampleStateTensors]
    Executor->>Executor: await or process Future result
    
    Note over Creator,Sampler: During shutdown
    
    Executor->>Executor: shutdown()
    Executor->>Sampler: stop_host_copy_thread()
    Sampler-->>Sampler: join host-copy thread

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The changes span five files with interconnected logic across executor creation, sampler state management, and threading lifecycle. Key review areas include understanding: the confidential compute detection flow, how the use_host_copy_thread flag propagates through multiple layers, the Future-based state management in sampler classes (property accessors for host resolution), async threading semantics, and integration points between executor and sampler lifecycle. While the changes follow a coherent pattern, the distributed scope and non-trivial Future/threading logic require careful cross-file reasoning.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	⚠️ Warning	PR description is largely incomplete with only a truncated title and empty Description/Test Coverage sections, leaving key implementation details undocumented.	Complete the Description section explaining the issue, solution, and rationale. Add Test Coverage section listing relevant tests that validate the new threading functionality.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title follows the required template format with a valid NVBugs ID and feature type designation. The title "Move D->H copies to a worker thread when confidential compute is active" is directly aligned with the core changes in the raw_summary, which detail the addition of threaded host-copy capability controlled by confidential compute detection. The title is concise and specific enough to convey the primary change, even though it appears truncated in the provided text. The title clearly summarizes the main enhancement being introduced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/sampler.py (1)

1918-1929: Same property pattern issues as SampleState.

This repeats the shadow-field property pattern from SampleState (lines 59-72) with the same concerns:

Unconventional for dataclasses

Blocking .result() call without error handling

Static analysis warnings

Apply the same refactoring suggestions from the earlier comment.

🧹 Nitpick comments (2)

tensorrt_llm/_torch/pyexecutor/sampler.py (2)
59-72: Consider refactoring the property pattern and add error handling.

The shadow-field pattern (line 60 _host with init=False + property wrapper) is unconventional for dataclasses and triggers static analysis warnings. More importantly:

The property getter (line 67) calls .result() which blocks until the Future completes, potentially causing unexpected stalls in code that accesses state.host.

No error handling for Future resolution failures.

Consider these alternatives:

Option 1: Keep host as Future | SampleStateTensors and require callers to explicitly resolve futures (more explicit about blocking).

Option 2: Add a separate resolve() method that returns the resolved host, keeping the property for non-blocking access.

Option 3: If the property pattern is preferred, at minimum wrap .result() in try-except.

Example for Option 3:
 @property
 def host(self) -> SampleStateTensors:
     if isinstance(self._host, Future):
-        return self._host.result()
+        try:
+            return self._host.result()
+        except Exception as e:
+            raise RuntimeError(f"Failed to resolve host tensor future: {e}") from e
     return self._host
1551-1560: Note cross-method Future dependency.

Futures are stored in request.py_topk_logprobs_vals and request.py_topk_logprobs_indices, which are later resolved in handle_logprobs() (lines 1027-1035). This creates an implicit dependency where handle_logprobs() must be called after the Futures complete, but there's no explicit enforcement of this ordering.

Consider adding documentation or assertions to make this dependency explicit.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb7fdce and 0576b5d.

📒 Files selected for processing (5)

tensorrt_llm/_torch/pyexecutor/_util.py (5 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/sampler.py (17 hunks)
tensorrt_llm/_utils.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_utils.py
tensorrt_llm/_torch/pyexecutor/sampler.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_utils.py
tensorrt_llm/_torch/pyexecutor/sampler.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_utils.py
tensorrt_llm/_torch/pyexecutor/sampler.py

🧬 Code graph analysis (4)

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)

tensorrt_llm/_torch/pyexecutor/sampler.py (2)

start_host_copy_thread (905-920)

start_host_copy_thread (2002-2017)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

tensorrt_llm/_utils.py (1)

confidential_compute_enabled (1201-1209)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

tensorrt_llm/_torch/pyexecutor/sampler.py (2)

stop_host_copy_thread (922-925)

stop_host_copy_thread (2019-2022)

tensorrt_llm/_torch/pyexecutor/sampler.py (2)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

shutdown (461-477)

tensorrt_llm/_utils.py (1)

nvtx_range (904-923)

🪛 Ruff (0.14.0)

tensorrt_llm/_torch/pyexecutor/_util.py

13-13: Redefinition of unused str_dtype_to_binding from line 12

Remove definition: str_dtype_to_binding

(F811)

13-13: Redefinition of unused torch_dtype_to_str from line 12

Remove definition: torch_dtype_to_str

(F811)

tensorrt_llm/_torch/pyexecutor/sampler.py

65-65: Redefinition of unused host from line 59

(F811)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (8)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

475-477: LGTM! Clean shutdown lifecycle for host-copy thread.

The defensive hasattr check ensures compatibility with samplers that don't implement host-copy threading, and the cleanup aligns with the lifecycle introduced in sampler.py.

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)

184-186: LGTM! Appropriate lifecycle hook for host-copy thread startup.

The helper function correctly guards against samplers that don't support host-copy threading, and the placement (lines 693-695) ensures the thread starts before the worker.
tensorrt_llm/_torch/pyexecutor/_util.py (1)
854-854: Verify error handling in confidential_compute_enabled.

This line calls confidential_compute_enabled() which queries NVML hardware state. Ensure the function in tensorrt_llm/_utils.py (lines 1201-1209) includes proper error handling to prevent exceptions from propagating here.

The review comment on _utils.py recommends adding try/except to confidential_compute_enabled(). Once that change is applied, this call site will be safe. If that recommendation is not implemented, consider adding error handling here:
try:
    use_host_copy_thread = confidential_compute_enabled()
except Exception:
    logger.warning("Failed to query confidential compute state, disabling host copy thread")
    use_host_copy_thread = False
tensorrt_llm/_torch/pyexecutor/sampler.py (5)

10-10: LGTM: Import added for threading support.

The ThreadPoolExecutor and Future imports are appropriate for the new threaded host-copy feature.

879-879: LGTM: New parameter properly typed and defaulted.

The use_host_copy_thread parameter addition is well-typed and has a safe default.

1027-1035: Consistent Future handling, but note blocking behavior.

The code correctly checks host_copy_thread_active() and resolves Futures with .result(). However, this blocks the calling thread until the host copy completes, which may impact performance if called in hot paths.

Consider whether this blocking is acceptable or if callers should be refactored to handle Futures asynchronously.

1368-1378: LGTM: Consistent conditional event handling.

The code correctly uses Futures to replace sampler_event when the host copy thread is active, avoiding redundant synchronization.

2183-2233: Well-structured threaded copy implementation.

The _copy_tensors_to_host function is well-organized and correctly handles:

Event synchronization with copy_ready.synchronize() (line 2191)

Proper non_blocking flag toggling based on threading mode (lines 2192-2194)

Conditional logprobs handling (lines 2206-2209)

Consistent Future vs. direct return based on thread mode (lines 2219-2233)

The approach cleanly encapsulates the copy logic and makes the threading behavior explicit.

tensorrt_llm/_torch/pyexecutor/_util.py

tensorrt_llm/_torch/pyexecutor/sampler.py

tensorrt_llm/_utils.py

tensorrt_llm/_torch/pyexecutor/_util.py

ixlmar

No functional concerns, but some suggestions to reduce complexity introduced into other parts of sampler.py.

tensorrt_llm/_torch/pyexecutor/sampler.py

dhansen-nvidia · 2025-11-10T17:59:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-10T18:05:39Z

PR_Github #24040 [ run ] triggered by Bot. Commit: fed5100

tensorrt-cicd · 2025-11-10T20:21:41Z

PR_Github #24040 [ run ] completed with state SUCCESS. Commit: fed5100
/LLM/main/L0_MergeRequest_PR pipeline #18114 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-10T22:04:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-10T22:10:35Z

PR_Github #24055 [ run ] triggered by Bot. Commit: abef7de

tensorrt-cicd · 2025-11-10T23:47:56Z

PR_Github #24055 [ run ] completed with state SUCCESS. Commit: abef7de
/LLM/main/L0_MergeRequest_PR pipeline #18128 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-12T16:38:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-12T16:44:05Z

PR_Github #24323 [ run ] triggered by Bot. Commit: 0b24a61

tensorrt-cicd · 2025-11-12T20:59:48Z

PR_Github #24323 [ run ] completed with state SUCCESS. Commit: 0b24a61
/LLM/main/L0_MergeRequest_PR pipeline #18353 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-12T21:43:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-12T21:49:19Z

PR_Github #24337 [ run ] triggered by Bot. Commit: 86f69b1

Signed-off-by: Dan Hansen <[email protected]>

…nto the class itself to improve composability Signed-off-by: Dan Hansen <[email protected]>

Signed-off-by: Dan Hansen <[email protected]>

…re logically belongs Signed-off-by: Dan Hansen <[email protected]>

Signed-off-by: Dan Hansen <[email protected]>

dhansen-nvidia · 2025-11-18T02:14:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-18T02:21:40Z

PR_Github #24826 [ run ] triggered by Bot. Commit: 3818121

dhansen-nvidia · 2025-11-18T22:11:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-18T22:17:52Z

PR_Github #24944 [ run ] triggered by Bot. Commit: 3818121

tensorrt-cicd · 2025-11-18T22:17:54Z

PR_Github #24826 [ run ] completed with state ABORTED. Commit: 3818121
LLM/main/L0_MergeRequest_PR #18737 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-11-19T05:49:33Z

PR_Github #24944 [ run ] completed with state FAILURE. Commit: 3818121
/LLM/main/L0_MergeRequest_PR pipeline #18842 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-19T15:34:28Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-19T15:40:31Z

PR_Github #25065 [ run ] triggered by Bot. Commit: 2f21cf1

tensorrt-cicd · 2025-11-19T21:20:50Z

PR_Github #25065 [ run ] completed with state SUCCESS. Commit: 2f21cf1
/LLM/main/L0_MergeRequest_PR pipeline #18945 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-20T03:25:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-20T03:31:30Z

PR_Github #25141 [ run ] triggered by Bot. Commit: 2f21cf1

tensorrt-cicd · 2025-11-20T13:58:45Z

PR_Github #25141 [ run ] completed with state SUCCESS. Commit: 2f21cf1
/LLM/main/L0_MergeRequest_PR pipeline #19009 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-20T16:44:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-20T16:50:50Z

PR_Github #25222 [ run ] triggered by Bot. Commit: 2f21cf1

tensorrt-cicd · 2025-11-21T01:31:06Z

PR_Github #25222 [ run ] completed with state SUCCESS. Commit: 2f21cf1
/LLM/main/L0_MergeRequest_PR pipeline #19077 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-21T02:57:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-21T03:03:04Z

PR_Github #25284 [ run ] triggered by Bot. Commit: 2f21cf1

tensorrt-cicd · 2025-11-21T06:09:40Z

PR_Github #25284 [ run ] completed with state SUCCESS. Commit: 2f21cf1
/LLM/main/L0_MergeRequest_PR pipeline #19129 completed with status: 'FAILURE'

dhansen-nvidia · 2025-11-21T15:53:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-21T16:00:03Z

PR_Github #25366 [ run ] triggered by Bot. Commit: b36abf8

tensorrt-cicd · 2025-11-22T02:30:20Z

PR_Github #25366 [ run ] completed with state SUCCESS. Commit: b36abf8
/LLM/main/L0_MergeRequest_PR pipeline #19187 completed with status: 'FAILURE'

dhansen-nvidia requested a review from a team as a code owner October 17, 2025 18:43

dhansen-nvidia requested a review from Naveassaf October 17, 2025 18:43

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

dhansen-nvidia changed the title ~~[https:/nvbugs/5508301][feat] Move D->H copies to a worker thread whe…~~ [https://nvbugs/5508301][feat] Move D->H copies to a worker thread whe… Oct 17, 2025

dhansen-nvidia requested a review from dcampora October 17, 2025 18:57

dhansen-nvidia commented Oct 17, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Outdated Show resolved Hide resolved

dhansen-nvidia force-pushed the cc_perf_fixes branch 3 times, most recently from 9af940a to 447e521 Compare October 24, 2025 18:10

dhansen-nvidia requested a review from a team as a code owner October 24, 2025 18:10

dhansen-nvidia requested a review from Superjomn October 24, 2025 18:10

dhansen-nvidia force-pushed the cc_perf_fixes branch from 447e521 to 78d62fe Compare November 6, 2025 22:01

ixlmar reviewed Nov 7, 2025

View reviewed changes

dhansen-nvidia force-pushed the cc_perf_fixes branch 2 times, most recently from d43f4a7 to 16ab0c6 Compare November 10, 2025 15:52

dhansen-nvidia force-pushed the cc_perf_fixes branch from fed5100 to abef7de Compare November 10, 2025 22:03

dhansen-nvidia force-pushed the cc_perf_fixes branch from abef7de to 0b24a61 Compare November 12, 2025 16:37

dhansen-nvidia force-pushed the cc_perf_fixes branch from 0b24a61 to 86f69b1 Compare November 12, 2025 21:42

mojombo added 6 commits November 17, 2025 18:13

Fix some issues discovered in testing

4aac1c1

Signed-off-by: Dan Hansen <[email protected]>

Put the synchronization functionality of the new SamplerEvent class i…

9821c18

…nto the class itself to improve composability Signed-off-by: Dan Hansen <[email protected]>

Fix testnames in test_lists

938f719

Signed-off-by: Dan Hansen <[email protected]>

Move async_worker_start() into PyExecutor::start_worker() where it mo…

db3f737

…re logically belongs Signed-off-by: Dan Hansen <[email protected]>

Fix api_stability unittest

f1810a6

Signed-off-by: Dan Hansen <[email protected]>

Change name of llmapi argument to better match guidelines in the docs

3818121

Signed-off-by: Dan Hansen <[email protected]>

dhansen-nvidia force-pushed the cc_perf_fixes branch from 83894ea to 3818121 Compare November 18, 2025 02:13

Merge branch 'main' into cc_perf_fixes

2f21cf1

Merge branch 'main' into cc_perf_fixes

b36abf8

[https://nvbugs/5508301][feat] Move D->H copies to a worker thread whe… #8463

Are you sure you want to change the base?

[https://nvbugs/5508301][feat] Move D->H copies to a worker thread whe… #8463

Conversation

dhansen-nvidia commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ixlmar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhansen-nvidia commented Nov 10, 2025

Uh oh!

tensorrt-cicd commented Nov 10, 2025

Uh oh!

tensorrt-cicd commented Nov 10, 2025

Uh oh!

dhansen-nvidia commented Nov 10, 2025

Uh oh!

tensorrt-cicd commented Nov 10, 2025

Uh oh!

tensorrt-cicd commented Nov 10, 2025

Uh oh!

dhansen-nvidia commented Nov 12, 2025

Uh oh!

tensorrt-cicd commented Nov 12, 2025

Uh oh!

tensorrt-cicd commented Nov 12, 2025

Uh oh!

dhansen-nvidia commented Nov 12, 2025

Uh oh!

tensorrt-cicd commented Nov 12, 2025

Uh oh!

dhansen-nvidia commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

dhansen-nvidia commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

dhansen-nvidia commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

dhansen-nvidia commented Oct 17, 2025 •

edited

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading