fix(grpc): eliminate request_id collision in host pending responses#7637
Open
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
Open
fix(grpc): eliminate request_id collision in host pending responses#7637octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…rosoft#7016) Each GrpcWorkerAgentRuntime starts its own per-session request_id counter from "1". When multiple runtimes send RPC requests whose target agent lives on the same worker, the host stored futures under _pending_responses[target_client_id][request_id] so two in-flight requests with identical request_ids collided: the second insert overwrote the first entry, and the subsequent pop() raised KeyError. Fix: before forwarding a request to the target runtime, replace the sender's request_id with a host-generated UUID. The UUID is stored as the key in _pending_responses. When the response arrives (carrying the UUID as its request_id), the original sender's request_id is restored so the sender's own pending_requests map can still match it. Add a regression test that reproduces the exact topology from the bug report: runtime2 → relay_agent (on runtime1) → inner_agent (also on runtime1). Co-Authored-By: Octopus <liyuan851277048@icloud.com>
gdy3
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #7016
Problem
Each
GrpcWorkerAgentRuntimestarts its own per-session request_id counterfrom
"1". When two different runtimes both send RPC requests to agents livingon the same target runtime, the host stored pending response futures under:
Because both senders start from
"1", both inserts share the same key — thesecond overwrites the first, and the subsequent
pop()raisesKeyError('1').Concrete topology (from the issue):
relay_agent(hosted on Runtime-1): request_id ="1"relay_agent→inner_agent(also on Runtime-1): request_id ="1"← collisionSolution
Before forwarding a request to the target runtime, the host now replaces the
sender's
request_idwith auuid.uuid4()string. The UUID is used as thekey in
_pending_responsesso keys are globally unique regardless of how manysenders share the same counter values.
When the response arrives (carrying the UUID), the original
request_idisrestored before the future is resolved, so the sender's own
pending_requestsmap can still match the response.
Changes are limited to
_worker_runtime_host_servicer.py:import uuid_process_request: generate a UUID, substitute it in the forwardedrequest, store
(future, original_request_id)in_pending_responses_process_response: look up by UUID, restoreoriginal_request_idTesting
Added
test_cross_runtime_rpc_no_request_id_collisionintest_worker_runtime.pythat reproduces the exact topology from the bugreport: an external runtime sends to a relay agent that in turn sends to an
inner agent on the same worker runtime, verifying the full chain completes
without
KeyError.