Skip to content

TUI freezes under many concurrent sub-agents (event-receiver RwLock contention starves render loop) #3728

Description

@Hmbown

Summary

Running many sub-agents concurrently (reported with ~13 sub-agents running + 2 background bash jobs) freezes the entire TUI — it stops redrawing and stops responding to input. Observed on v0.8.65 (YOLO mode), but the deeper root cause still exists on main.

Repro

  • Kick off enough work that ~10+ sub-agents run at once (panel shows Agents 13 running / 13, Bash jobs: 2 running).
  • Screen locks up; input and redraw stall.

Root-cause analysis (file:line)

The TUI event loop and the per-turn monitors contend on a single shared, write-locked event receiver, and high-volume agent progress events make it worse.

  1. Lock contention on the shared event receiver (primary).
    rx_event is an Arc<RwLock<mpsc::Receiver<Event>>>. Multiple tasks take the exclusive write lock:

    • TUI event/render loop drains under the write lock — crates/tui/src/tui/ui.rs:1808
    • One monitor_turn task per sub-agent (so ~13 at once) also takes the write lock and blocks on recv().awaitcrates/tui/src/core/engine/runtime_threads.rs:2850
    • Exec stream handler contends for the same lock — crates/tui/src/core/main.rs:7067 (verify path)

    With ~13 monitors holding/queuing the writer lock, tokio RwLock writer-fairness starves the render loop from acquiring it → no redraw, no input drain → freeze.

  2. Bounded event channel (256) under burst.
    mpsc::channel(256)crates/tui/src/core/engine.rs:824. Many agents emitting rapid AgentProgress (try_send, crates/tui/src/tools/subagent/mod.rs ~6158) can fill the buffer; events drop or senders stall, compounding the stall.

  3. Redraw storm (partially mitigated on main).
    AgentProgress redraws are throttled (~100ms) at crates/tui/src/tui/ui.rs:2729-2747 (from v0.8.58: Fix live-subagent freeze under load — 4+ concurrent subagents block TUI input/rendering #3033). This helps but does not remove the lock-contention root cause; v0.8.65 users without/with the throttle still freeze under enough agents.

Suggested fixes

  • Remove the RwLock around rx_event — there should be a single owner draining it; give monitors their own event stream/side-channel instead of contending with the render loop.
  • Or move monitor_turn off the render-critical lock entirely.
  • Increase event channel capacity and/or coalesce AgentProgress events before send.
  • Add/raise a concurrent sub-agent cap (semaphore) so a burst can't spawn an unbounded number of contending monitors.

Environment

  • codewhale version: 0.8.65 (observed); root cause present on main (0.8.66)
  • Surface: TUI, concurrent sub-agents (YOLO mode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtuiTerminal UI behavior, rendering, or interaction

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions