-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v0.8.66: Move sub-agent state persistence out of manager write-lock hot paths #3805
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't workingrelease-blockerMust be fixed before the next releaseMust be fixed before the next releasereliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingtuiTerminal UI behavior, rendering, or interactionTerminal UI behavior, rendering, or interactionv0.8.66Targeting v0.8.66Targeting v0.8.66
Milestone
Description
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingrelease-blockerMust be fixed before the next releaseMust be fixed before the next releasereliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingtuiTerminal UI behavior, rendering, or interactionTerminal UI behavior, rendering, or interactionv0.8.66Targeting v0.8.66Targeting v0.8.66
Projects
StatusShow more project fields
Done
Problem
Sub-agent manager update paths can perform synchronous JSON serialization and file writes while the manager write lock is held. Under high fanout, launch/completion/list operations contend on that write lock, and persistence can amplify stalls.
Parent: #3800
Verified evidence
SubAgentManager::spawninserts an agent and callspersist_state_best_effort()before returning the snapshot.update_from_result/update_failedupdate terminal state and callpersist_state_best_effort()on change.persist_state_best_effort()callspersist_state(), which callswrite_json_atomic.write_json_atomicperformsserde_json::to_string_pretty,fs::create_dir_all,fs::write, andfs::renamesynchronously.Arc<RwLock<SubAgentManager>>::write().await.Critical framing
Earlier broad claims that blocking I/O starved the worker pool were disproven for the old freeze. This issue should target only the manager-lock critical section: make the lock-held work small and measurable. Do not add a blanket speculative
spawn_blockingpatch without proving lock contention improves.Suggested implementation options
Acceptance criteria
Security / policy guardrails
Persistence refactors must preserve state integrity: