Context
The codemode connector runtime (@cloudflare/codemode + Think's createExecuteTool) adapts AI SDK ToolSets into the sandbox via ToolSetConnector. Tools without an execute function — client-side tools resolved in the browser (getUserTimezone, ask_user, …) — are currently excluded from both the sandbox bindings and the generated types, with a one-time warning. The model can still call them as ordinary top-level tools, just not from inside execute code.
This issue is to discuss whether (and how) to bridge them.
Option A — status quo (skip + warn)
Client tools stay top-level. Usually fine: they're interactive one-offs that rarely benefit from being batched inside sandboxed code. Zero new surface area.
Option B — generalize the pause machinery into value-carrying resolution
The runtime already pauses durably on requiresApproval and resumes via approve(). A client tool is the same shape with one twist: instead of approve → the server executes the call, the client computes the result and posts it back. Sketch:
- Runtime: a
resolve(executionId, seq, result) RPC that flips a pending log entry straight to applied with the supplied result, then resumes the run — approve/reject become special cases of resolution.
ToolSetConnector: expose execute-less tools as pause-always entries, annotated e.g. resolution: "client" so hosts/UIs can distinguish "needs a human yes/no" from "needs the client to run something".
- Think: surface these in
pendingExecutions(); the SPA routes them through its existing onToolCall handler and posts the result back via a resolveExecution(executionId, seq, result) callable. The transcript's paused tool output is replaced and the chat auto-continues — identical flow to the existing approval cards.
Model-written code could then do:
const tz = await tools.getUserTimezone({});
and the run durably parks until the browser answers, with abort-and-replay handling the rest for free (the resolved value is recorded in the log and replays like any applied result).
Costs / open questions
- Trust surface: the client supplies a recorded "result" that replays as ground truth. Size limits apply (
MAX_DURABLE_VALUE_BYTES), but validation against the tool's output shape doesn't exist today.
- UX: multiple pending interactions per run (a paused run currently exposes one pending action at the abort point — client tools would keep that property, but chained client calls mean pause → resolve → pause → resolve round trips, each a full replay pass).
- Expiry semantics:
expirePaused would reject runs waiting on a client that never answers — probably the right default, but worth stating.
- Offline clients: a run paused on a client tool with no connected client is stuck until expiry; should
pendingExecutions() distinguish these so UIs can prompt reconnection?
Proposal
Keep Option A for the current release (already shipped in the connector-runtime PR). Build Option B only when a concrete use case needs batched client interactions inside sandbox code — the design above shows it's an incremental extension of the existing approval flow rather than a rework.
Context
The codemode connector runtime (
@cloudflare/codemode+ Think'screateExecuteTool) adapts AI SDKToolSets into the sandbox viaToolSetConnector. Tools without anexecutefunction — client-side tools resolved in the browser (getUserTimezone,ask_user, …) — are currently excluded from both the sandbox bindings and the generated types, with a one-time warning. The model can still call them as ordinary top-level tools, just not from insideexecutecode.This issue is to discuss whether (and how) to bridge them.
Option A — status quo (skip + warn)
Client tools stay top-level. Usually fine: they're interactive one-offs that rarely benefit from being batched inside sandboxed code. Zero new surface area.
Option B — generalize the pause machinery into value-carrying resolution
The runtime already pauses durably on
requiresApprovaland resumes viaapprove(). A client tool is the same shape with one twist: instead of approve → the server executes the call, the client computes the result and posts it back. Sketch:resolve(executionId, seq, result)RPC that flips apendinglog entry straight toappliedwith the supplied result, then resumes the run —approve/rejectbecome special cases of resolution.ToolSetConnector: expose execute-less tools as pause-always entries, annotated e.g.resolution: "client"so hosts/UIs can distinguish "needs a human yes/no" from "needs the client to run something".pendingExecutions(); the SPA routes them through its existingonToolCallhandler and posts the result back via aresolveExecution(executionId, seq, result)callable. The transcript's paused tool output is replaced and the chat auto-continues — identical flow to the existing approval cards.Model-written code could then do:
and the run durably parks until the browser answers, with abort-and-replay handling the rest for free (the resolved value is recorded in the log and replays like any applied result).
Costs / open questions
MAX_DURABLE_VALUE_BYTES), but validation against the tool's output shape doesn't exist today.expirePausedwould reject runs waiting on a client that never answers — probably the right default, but worth stating.pendingExecutions()distinguish these so UIs can prompt reconnection?Proposal
Keep Option A for the current release (already shipped in the connector-runtime PR). Build Option B only when a concrete use case needs batched client interactions inside sandbox code — the design above shows it's an incremental extension of the existing approval flow rather than a rework.