Problem
A high-fanout parent turn can make the TUI appear frozen while launching or completing many sub-agents. The current repro is a user-triggered fanout of roughly 20 agent calls: sidebar state changes, input responsiveness stalls, and completions appear to roll in slowly or inconsistently.
This issue is the release-gate umbrella for the v0.8.66 freeze work. It should stay open until CodeWhale can launch and observe a large sub-agent fanout without freezing the UI or losing critical completion state.
Verified evidence from current tree
agent inherits supports_parallel() == false from the default tool spec (crates/tui/src/tools/spec.rs), so each agent call is planned as a serial tool batch.
- Non-parallel tools take the write side of the global tool execution lock in
crates/tui/src/core/engine/tool_execution.rs.
turn_loop.rs already documents the user-visible freeze symptom for serial fanout: six agent calls resolving routes under the global tool lock can read as a hard TUI freeze.
- The engine op channel is bounded at 32 and the event channel at 256 (
crates/tui/src/core/engine.rs). UI code awaits Op::ListSubAgents after event-drain batches.
- Sub-agent completion UI events use
try_send, so under event-channel pressure they can be dropped from the UI stream even if parent completion signaling uses a separate path.
ListSubAgents and sub-agent completions contend on the SubAgentManager write lock; manager update paths can persist JSON state synchronously while holding that lock.
- The async TUI loop still calls blocking
std::sync::Mutex::lock() on the shell manager in sidebar/live-output refresh paths.
Critical framing
Do not treat this as the old v0.8.61 freeze unless fresh measurements prove it is the same failure mode. The earlier speculative “blocking I/O starves the worker pool” theory was disproven. This lane should fix the currently observed fanout/backpressure paths with targeted evidence, not a blanket spawn_blocking patch.
Required outcome
A parent turn that requests 20 sub-agents should remain interruptible and visibly alive:
- TUI input polling continues while launches are in progress.
- The sidebar/status view updates without blocking the event loop.
- Launching N agents should not take roughly N × route-resolution/spawn latency when the work is independent and within configured sub-agent limits.
- Completion events that matter to parent-turn correctness are not lost; UI refresh events can be coalesced but must converge to correct state.
- Cancelling the parent turn interrupts any remaining launch queue quickly and leaves well-formed tool results.
Acceptance criteria
Child work
Create/track child issues for:
- parallel-safe
agent launch dispatch,
- engine/TUI channel backpressure,
- nonblocking/coalesced sub-agent sidebar refresh,
- shell-manager lock removal from async UI hot paths,
- SubAgentManager lock/persistence hot-path cleanup.
Tracked child issues
Security / policy guardrails
Responsiveness fixes must not change the authority model. The implementation must preserve these constraints:
- Parallel
agent launch must not bypass approval policy, sandbox policy, tool gates, sub-agent depth limits, sub-agent concurrency limits, token/budget limits, or cancellation.
- Parallelism should only reduce waiting to start independent children; it must not grant children more authority than a serial launch would.
- Critical events must remain durable or recoverable: approval prompts, tool results, fatal errors, user-input requests, cancellation, parent completion, and terminal sub-agent state. Only refresh/status events may be coalesced or dropped, and only if the next snapshot converges to correct state.
- Cleanup cannot disappear when sidebar listing becomes read-only. Stale child timeout/cancel enforcement must remain periodic or event-driven.
- UI snapshots may become stale under contention, but enforcement must never depend on a successful UI refresh.
- Persistence changes must keep atomic writes, symlink/path hardening, and recoverable terminal state.
Problem
A high-fanout parent turn can make the TUI appear frozen while launching or completing many sub-agents. The current repro is a user-triggered fanout of roughly 20
agentcalls: sidebar state changes, input responsiveness stalls, and completions appear to roll in slowly or inconsistently.This issue is the release-gate umbrella for the v0.8.66 freeze work. It should stay open until CodeWhale can launch and observe a large sub-agent fanout without freezing the UI or losing critical completion state.
Verified evidence from current tree
agentinheritssupports_parallel() == falsefrom the default tool spec (crates/tui/src/tools/spec.rs), so each agent call is planned as a serial tool batch.crates/tui/src/core/engine/tool_execution.rs.turn_loop.rsalready documents the user-visible freeze symptom for serial fanout: sixagentcalls resolving routes under the global tool lock can read as a hard TUI freeze.crates/tui/src/core/engine.rs). UI code awaitsOp::ListSubAgentsafter event-drain batches.try_send, so under event-channel pressure they can be dropped from the UI stream even if parent completion signaling uses a separate path.ListSubAgentsand sub-agent completions contend on theSubAgentManagerwrite lock; manager update paths can persist JSON state synchronously while holding that lock.std::sync::Mutex::lock()on the shell manager in sidebar/live-output refresh paths.Critical framing
Do not treat this as the old v0.8.61 freeze unless fresh measurements prove it is the same failure mode. The earlier speculative “blocking I/O starves the worker pool” theory was disproven. This lane should fix the currently observed fanout/backpressure paths with targeted evidence, not a blanket
spawn_blockingpatch.Required outcome
A parent turn that requests 20 sub-agents should remain interruptible and visibly alive:
Acceptance criteria
ListSubAgentscoalescing, and manager persistence/locking behavior.Child work
Create/track child issues for:
agentlaunch dispatch,Tracked child issues
Security / policy guardrails
Responsiveness fixes must not change the authority model. The implementation must preserve these constraints:
agentlaunch must not bypass approval policy, sandbox policy, tool gates, sub-agent depth limits, sub-agent concurrency limits, token/budget limits, or cancellation.