Skip to content

[Discussion] Client-side tools inside the codemode execute sandbox #1735

@threepointone

Description

@threepointone

Context

The codemode connector runtime (@cloudflare/codemode + Think's createExecuteTool) adapts AI SDK ToolSets into the sandbox via ToolSetConnector. Tools without an execute function — client-side tools resolved in the browser (getUserTimezone, ask_user, …) — are currently excluded from both the sandbox bindings and the generated types, with a one-time warning. The model can still call them as ordinary top-level tools, just not from inside execute code.

This issue is to discuss whether (and how) to bridge them.

Option A — status quo (skip + warn)

Client tools stay top-level. Usually fine: they're interactive one-offs that rarely benefit from being batched inside sandboxed code. Zero new surface area.

Option B — generalize the pause machinery into value-carrying resolution

The runtime already pauses durably on requiresApproval and resumes via approve(). A client tool is the same shape with one twist: instead of approve → the server executes the call, the client computes the result and posts it back. Sketch:

  1. Runtime: a resolve(executionId, seq, result) RPC that flips a pending log entry straight to applied with the supplied result, then resumes the run — approve/reject become special cases of resolution.
  2. ToolSetConnector: expose execute-less tools as pause-always entries, annotated e.g. resolution: "client" so hosts/UIs can distinguish "needs a human yes/no" from "needs the client to run something".
  3. Think: surface these in pendingExecutions(); the SPA routes them through its existing onToolCall handler and posts the result back via a resolveExecution(executionId, seq, result) callable. The transcript's paused tool output is replaced and the chat auto-continues — identical flow to the existing approval cards.

Model-written code could then do:

const tz = await tools.getUserTimezone({});

and the run durably parks until the browser answers, with abort-and-replay handling the rest for free (the resolved value is recorded in the log and replays like any applied result).

Costs / open questions

  • Trust surface: the client supplies a recorded "result" that replays as ground truth. Size limits apply (MAX_DURABLE_VALUE_BYTES), but validation against the tool's output shape doesn't exist today.
  • UX: multiple pending interactions per run (a paused run currently exposes one pending action at the abort point — client tools would keep that property, but chained client calls mean pause → resolve → pause → resolve round trips, each a full replay pass).
  • Expiry semantics: expirePaused would reject runs waiting on a client that never answers — probably the right default, but worth stating.
  • Offline clients: a run paused on a client tool with no connected client is stuck until expiry; should pendingExecutions() distinguish these so UIs can prompt reconnection?

Proposal

Keep Option A for the current release (already shipped in the connector-runtime PR). Build Option B only when a concrete use case needs batched client interactions inside sandbox code — the design above shows it's an incremental extension of the existing approval flow rather than a rework.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions