v0.8.66: Release gate for multi sub-agent fanout freeze

## Problem

A high-fanout parent turn can make the TUI appear frozen while launching or completing many sub-agents. The current repro is a user-triggered fanout of roughly 20 `agent` calls: sidebar state changes, input responsiveness stalls, and completions appear to roll in slowly or inconsistently.

This issue is the release-gate umbrella for the v0.8.66 freeze work. It should stay open until CodeWhale can launch and observe a large sub-agent fanout without freezing the UI or losing critical completion state.

## Verified evidence from current tree

- `agent` inherits `supports_parallel() == false` from the default tool spec (`crates/tui/src/tools/spec.rs`), so each agent call is planned as a serial tool batch.
- Non-parallel tools take the write side of the global tool execution lock in `crates/tui/src/core/engine/tool_execution.rs`.
- `turn_loop.rs` already documents the user-visible freeze symptom for serial fanout: six `agent` calls resolving routes under the global tool lock can read as a hard TUI freeze.
- The engine op channel is bounded at 32 and the event channel at 256 (`crates/tui/src/core/engine.rs`). UI code awaits `Op::ListSubAgents` after event-drain batches.
- Sub-agent completion UI events use `try_send`, so under event-channel pressure they can be dropped from the UI stream even if parent completion signaling uses a separate path.
- `ListSubAgents` and sub-agent completions contend on the `SubAgentManager` write lock; manager update paths can persist JSON state synchronously while holding that lock.
- The async TUI loop still calls blocking `std::sync::Mutex::lock()` on the shell manager in sidebar/live-output refresh paths.

## Critical framing

Do not treat this as the old v0.8.61 freeze unless fresh measurements prove it is the same failure mode. The earlier speculative “blocking I/O starves the worker pool” theory was disproven. This lane should fix the currently observed fanout/backpressure paths with targeted evidence, not a blanket `spawn_blocking` patch.

## Required outcome

A parent turn that requests 20 sub-agents should remain interruptible and visibly alive:

- TUI input polling continues while launches are in progress.
- The sidebar/status view updates without blocking the event loop.
- Launching N agents should not take roughly N × route-resolution/spawn latency when the work is independent and within configured sub-agent limits.
- Completion events that matter to parent-turn correctness are not lost; UI refresh events can be coalesced but must converge to correct state.
- Cancelling the parent turn interrupts any remaining launch queue quickly and leaves well-formed tool results.

## Acceptance criteria

- [ ] Add an automated or documented release-gate repro that launches a high-fanout sub-agent batch (target: 20) and records launch latency, input responsiveness, event backlog, and completion convergence.
- [ ] The repro passes with no multi-second TUI input freeze on a normal local checkout.
- [ ] The fix set includes targeted tests for tool dispatch batching, engine/op channel backpressure, `ListSubAgents` coalescing, and manager persistence/locking behavior.
- [ ] Runtime logs include enough bounded diagnostics to distinguish launch serialization, event-channel pressure, op-channel pressure, and manager-lock contention.
- [ ] The implementation does not remove sub-agent depth/concurrency configurability and does not reintroduce old lifecycle/delegate tool surfaces.

## Child work

Create/track child issues for:

- parallel-safe `agent` launch dispatch,
- engine/TUI channel backpressure,
- nonblocking/coalesced sub-agent sidebar refresh,
- shell-manager lock removal from async UI hot paths,
- SubAgentManager lock/persistence hot-path cleanup.


## Tracked child issues

- [ ] #3801 — Allow independent agent launches to fan out without global tool-lock serialization
- [ ] #3802 — Remove engine/TUI channel backpressure from sub-agent status storms
- [ ] #3803 — Make sub-agent sidebar refresh read-only and coalesced
- [ ] #3804 — Remove blocking shell-manager locks from async TUI refresh paths
- [ ] #3805 — Move sub-agent state persistence out of manager write-lock hot paths

## Security / policy guardrails

Responsiveness fixes must not change the authority model. The implementation must preserve these constraints:

- Parallel `agent` launch must not bypass approval policy, sandbox policy, tool gates, sub-agent depth limits, sub-agent concurrency limits, token/budget limits, or cancellation.
- Parallelism should only reduce waiting to start independent children; it must not grant children more authority than a serial launch would.
- Critical events must remain durable or recoverable: approval prompts, tool results, fatal errors, user-input requests, cancellation, parent completion, and terminal sub-agent state. Only refresh/status events may be coalesced or dropped, and only if the next snapshot converges to correct state.
- Cleanup cannot disappear when sidebar listing becomes read-only. Stale child timeout/cancel enforcement must remain periodic or event-driven.
- UI snapshots may become stale under contention, but enforcement must never depend on a successful UI refresh.
- Persistence changes must keep atomic writes, symlink/path hardening, and recoverable terminal state.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.66: Release gate for multi sub-agent fanout freeze #3800

Problem

Verified evidence from current tree

Critical framing

Required outcome

Acceptance criteria

Child work

Tracked child issues

Security / policy guardrails

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.8.66: Release gate for multi sub-agent fanout freeze #3800

Description

Problem

Verified evidence from current tree

Critical framing

Required outcome

Acceptance criteria

Child work

Tracked child issues

Security / policy guardrails

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions