[None][feat] Support for cancelling requests with disaggregation #8114

pcastonguay · 2025-10-01T14:48:44Z

Summary by CodeRabbit

New Features
- Added ability to cancel in-flight KV cache transfer requests.
- Introduced readiness signaling across sender/receiver and agent connections.
- Exposed cancel_request in Python bindings and executor interfaces.
Refactor
- Centralized cache-transfer error detection and handling in the executor.
- Streamlined synchronization and locking behavior during transfer and cancellation.
Tests
- Added unit tests for notification/serialization paths, including readiness signals.
- Added test covering canceling a context transfer mid-flight and subsequent request behavior.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Shunkang <[email protected]>

Signed-off-by: Patrice Castonguay <[email protected]>

pcastonguay · 2025-10-01T14:50:19Z

Supersedes #6587

pcastonguay · 2025-10-01T14:53:52Z

/bot run --disable-fail-fast

coderabbitai · 2025-10-01T14:59:05Z

📝 Walkthrough

Walkthrough

Adds request-cancellation and readiness-signal capabilities across KV cache transmission: new cancelRequest APIs at transceiver/sender/receiver layers, readiness signaling between agents, Python bindings and executor integration for cancellation, plus serialization support and tests. CacheTransceiver routes cancellations based on request type; executor attempts cancel during in-flight transmissions.

Changes

Cohort / File(s)	Summary
Transceiver interfaces `cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h`	Adds BaseCacheTransceiver::cancelRequest and CacheTransceiver override.
Transceiver impl routing `cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp`	Implements CacheTransceiver::cancelRequest: delegates to sender for context-only, receiver for generation-only; otherwise returns false.
Data transceiver: sender/receiver logic `cpp/tensorrt_llm/batch_manager/dataTransceiver.h`, `cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp`	Adds cancelRequest to CacheSender/CacheReceiver; introduces readiness signaling (kREADY_SIGNAL_TAG, sendReadySignal, receiveReadySignal); tracks cancellations; adjusts locking; handles completion paths for cancelled vs non-cancelled requests; adds default process info.
Agent connection protocol `cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h`, `cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp`	Adds ReadySignalInfo and integrates into NotificationInfo variant; serialization/deserialization and size handling; AgentConnection send/recv ready signal; templated waitForNotification; updates waitForSyncInfo; adds waitForReadySignal.
C++ Python bindings (nanobind/pybind) `cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp`, `cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp`	Exposes cancel_request on BaseCacheTransceiver; PyCacheTransceiver implements cancelRequest override.
Py torch executor bindings `tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py`	Adds abstract cancel_request and implementation delegating to underlying impl.
Executor cancellation/error handling `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Centralizes disagg transfer error checks; adds in-transmission detection; adds _try_cancel_request using kv_cache_transceiver.cancel_request; updates cancellation flow.
C++ serialization tests `cpp/tests/unit_tests/executor/serializeUtilsTest.cpp`	Tests for RequestAndBufferInfo, ReadySignalInfo, NotificationSyncInfo, and NotificationInfo variants.
Python KV-cache tests `tests/unittest/others/test_kv_cache_transceiver.py`	Adds cancel-in-transmission test; validates generation request transitions to DISAGG_TRANS_ERROR after prior context cancellation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as Executor
  participant CT as CacheTransceiver
  participant CS as CacheSender
  participant CR as CacheReceiver

  rect rgb(235,245,255)
  note over User,CT: Cancel request routing
  User->>CT: cancelRequest(req)
  alt Context-only request
    CT->>CS: cancelRequest(req)
    CS-->>CT: bool cancelled
  else Generation-only request
    CT->>CR: cancelRequest(req)
    CR-->>CT: bool cancelled
  else Other
    CT-->>User: false
  end
  CT-->>User: bool
  end

sequenceDiagram
  autonumber
  participant CS as CacheSender
  participant ACM as AgentConnectionManager
  participant AC as AgentConnection
  participant CR as CacheReceiver

  rect rgb(245,255,245)
  note over CS,CR: Readiness signaling
  CS->>AC: sendReadySignal(ctx, isReady)
  AC-->>ACM: notification(ReadySignalInfo)
  ACM->>CR: waitForReadySignal(remote, ReadySignalInfo)
  CR-->>ACM: aggregated isReady
  ACM-->>CR: return
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description remains the untouched template with placeholder comments and lacks any concrete summary of the changes, explanation of the implementation, or details on test coverage, leaving reviewers without the necessary context to understand or verify the work.	Please replace the template placeholders with a clear summary of what was changed and why, detail which tests cover the new cancellation and readiness signaling code paths, and remove any unused template comments so that the PR description fully documents the work.
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.02% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “[None][feat] Support for cancelling requests with disaggregation” follows the repository’s tagging convention and concisely highlights the primary new feature of request cancellation in a disaggregated setting, making it clear to reviewers what the main change entails without unnecessary detail.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-10-01T14:59:08Z

PR_Github #20468 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cpp/tests/unit_tests/executor/serializeUtilsTest.cpp (1)
2-2: Update copyright year to 2025.

The copyright header reflects 2023-2024, but this file is being modified in 2025.

Apply this diff:
-* SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+* SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Based on coding guidelines.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
44-44: Fix NB_TRAMPOLINE count to include the new pure virtual.

You now have 7 NB_OVERRIDE_PURE methods (including cancelRequest) but NB_TRAMPOLINE is still set to 6. This can misconfigure the trampoline/vtable and break overrides.

Apply this diff:
-    NB_TRAMPOLINE(tb::BaseCacheTransceiver, 6);
+    NB_TRAMPOLINE(tb::BaseCacheTransceiver, 7);
Also applies to: 76-79
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

1893-1908: Always clear canceled request IDs after processing
Unconditionally invoke executor_request_queue.clear_canceled_req_ids() after the cancellation loop (not only when enable_attention_dp is true) to avoid accumulating stale IDs in non-ADP runs.

🧹 Nitpick comments (5)

cpp/tests/unit_tests/executor/serializeUtilsTest.cpp (1)

1073-1083: Consider reusing the existing serializeDeserialize template.

The new serializeDeserializeNotification helper duplicates logic already present in the generic serializeDeserialize template (lines 169-178). The only difference is the call to static member functions (T::serialize/T::deserialize) versus free functions.

If the notification types support the same serialization interface as other types, consider unifying under the existing template to reduce duplication.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
81-81: Namespace end comment nit.

Prefer a precise comment for anonymous namespace:
-} // namespace
+} // anonymous namespace
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
18-18: Add project-standard include guards.

Guidelines require include guards named TRTLLM_DATATRANSCEIVER_H for headers. Keep pragma once if you like, but add guards for consistency.
+#ifndef TRTLLM_DATATRANSCEIVER_H
+#define TRTLLM_DATATRANSCEIVER_H
 #pragma once
 ...
 } // namespace tensorrt_llm::batch_manager
+
+#endif // TRTLLM_DATATRANSCEIVER_H
Please confirm whether this repository consistently uses include guards in other headers; if so, adopt the same pattern here. As per coding guidelines.

Also applies to: 311-311
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

1862-1882: Cancellation helper behavior looks correct; consider tiny polish.

Logic is clear: if no transceiver or not in transmission, allow cancel; otherwise delegate. Optionally add a brief docstring and return type hint for readability.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

509-545: Ready-signal dispatch and conditional send look correct.

The isReady gate, signal dispatch, and cleanup paths are coherent. Minor nit: variable name shadowing for 'it' below this scope can be avoided for readability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba8abea and 3daa2fc.

📒 Files selected for processing (12)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (7 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (8 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (2 hunks)
cpp/tensorrt_llm/pybind/batch_manager/cacheTransceiver.cpp (2 hunks)
cpp/tests/unit_tests/executor/serializeUtilsTest.cpp (2 hunks)
tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (3 hunks)
tests/unittest/others/test_kv_cache_transceiver.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh}