Summary
Phase2 retest after #321 still has a red point in autonomy_v2_p2_fix_slice_v1_20260410_000938 B1, but the blocker is now more specific than "already-satisfied item retire is missing".
The last closure-hint response contains a correct ANVIL_PLAN_UPDATE that checks the final remaining file-backed item (src/app/api/worktrees/[id]/current-output/route.ts), but the agentic loop hits the stagnation escape hatch and breaks before parsing/applying that response.
As a result, the checked retire for the last item is dropped, remaining=1 persists, and the run ends as partial + runner_nonzero_exit even though the model emitted the intended no-change closure.
Reproduction / observed behavior
Result artifact
B1_result.json shows the external failure shape:
valid_run=true
exit_class=runner_nonzero_exit
command_return_code=2
changed_files=0
anvil_error_count=1
session_completed_count=1
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_000938/B1_result.json:17-35
What the log shows
- The first
ANVIL_PLAN_UPDATE checks five items but does not include current-output/route.ts.
- Runtime retires those five items and immediately reports
remaining=1 next=src/app/api/worktrees/[id]/current-output/route.ts ... no change needed.
B1.log:498-504
- five
plan item retired via checked marker
plan update pipeline: all items checked → retired
plan-aware final gate: suppressing ANVIL_FINAL (premature) remaining=1 ... current-output/route.ts ... no change needed
- After the closure hint, the model emits a second
ANVIL_PLAN_UPDATE that explicitly checks the last remaining item:
- [x] src/app/api/worktrees/[id]/current-output/route.ts: uses detectSessionStatus() which uses buildDetectPromptOptions() - no change needed
- That same response also emits
ANVIL_FINAL saying the audit found no remaining gaps.
- But immediately after that response, the loop logs:
-
stagnation detected; forced_mode_active=true
-
pre-exit repair turn injected before escape hatch
-
stagnation escape hatch: terminating loop
-
completion_kind=partial plan_items=6 plan_finished=5
-
telemetry plan_update_count=1
-
B1.log:561-568
Crucially:
- there is no retire log for
current-output/route.ts
- telemetry still says
plan_update_count=1 even though the log visibly contains two ANVIL_PLAN_UPDATE blocks
This strongly indicates that the second, closure-hint-driven update was never applied.
Source-backed root cause
1. Escape hatch runs before late-response plan parsing
In the main loop, stagnation accounting and escape-hatch termination happen before the code that parses next_token_buffer for ANVIL_PLAN / ANVIL_PLAN_UPDATE.
Ordering in src/app/agentic.rs:
- end-turn stagnation /
should_allow_escape_hatch()
- optional
pre-exit repair injection
break
- only after that,
try_register_plan(&next_token_buffer) / try_update_plan(&next_token_buffer)
Code:
src/app/agentic.rs:987-1045
src/app/agentic.rs:1068-1126
So when the latest zero-tool-call response contains a corrective ANVIL_PLAN_UPDATE and escape hatch fires on that same turn, the loop terminates before the update pipeline ever sees the response.
2. Runtime explicitly instructs the model to do the thing that gets dropped
The structured pre-exit repair message tells the model:
- mutate the remaining file, or
- if no change is needed, emit
ANVIL_PLAN_UPDATE with [x], then
- emit
ANVIL_FINAL
Code:
So the runtime’s own recovery protocol expects checked retirement on the last turn — but the loop ordering can discard exactly that response.
3. Checked retire machinery itself is not the primary failure here
The checked-first pipeline does exist and works when it actually runs:
src/app/execution_plan.rs:121-168
tests/plan_item_retire.rs:16-48
This B1 failure is therefore distinct from the original #301 retirement gap. The issue is not merely “cannot retire already-satisfied item”; it is “the final checked update never reaches the retirement pipeline because the loop breaks too early.”
Why this is distinct from existing issues
Those were about retirement semantics once the update path is actually processed.
Here, the last ANVIL_PLAN_UPDATE is present in the model output but is dropped before try_update_plan() runs.
#321 focused on reactive reachability from repeated single-file edit failure into agent.fix_slice.
This B1 failure is a read-only / no-change closure path after closure hint, with fixslice_escalation_count=0 and no worker-path evidence. It is a separate late-stage loop-ordering bug.
#287 was a broader remaining=1 hang / timeout family.
This issue is narrower and source-backed: the agentic loop has a concrete control-flow bug where escape hatch termination precedes late-response plan update parsing.
Impact
Fix direction
-
Process next_token_buffer before any escape-hatch break
- Apply
try_register_plan() / try_update_plan() and ANVIL_FINAL tracking first.
- Re-evaluate completion / final gate after those updates.
-
Do not inject a pre-exit repair message and break in the same branch
- If a repair message is injected, the loop should continue at least one more iteration so the message can actually be sent and its response processed.
- Otherwise the injected repair message is dead-on-arrival.
-
Add regression coverage for the exact ordering bug
- Setup: one unfinished plan item remains.
- Model response has zero tool calls, emits
ANVIL_PLAN_UPDATE with [x] for the last item plus ANVIL_FINAL.
should_allow_escape_hatch() is true on that same turn.
- Expected: the update is applied, the item becomes
AlreadySatisfied, final gate allows termination, and completion is not partial.
-
Add telemetry/assertion coverage
- When two
ANVIL_PLAN_UPDATE blocks are emitted across the run, plan_update_count should reflect both once both responses are processed.
- For this repro shape,
current-output/route.ts should produce a checked-retire log before termination.
Acceptance criteria
Summary
Phase2 retest after #321 still has a red point in
autonomy_v2_p2_fix_slice_v1_20260410_000938B1, but the blocker is now more specific than "already-satisfied item retire is missing".The last closure-hint response contains a correct
ANVIL_PLAN_UPDATEthat checks the final remaining file-backed item (src/app/api/worktrees/[id]/current-output/route.ts), but the agentic loop hits the stagnation escape hatch andbreaks before parsing/applying that response.As a result, the checked retire for the last item is dropped,
remaining=1persists, and the run ends aspartial+runner_nonzero_exiteven though the model emitted the intended no-change closure.Reproduction / observed behavior
4c9acf7(PR fix(issue-321): add reactive fix_slice escalation from repeated edit failures #322 / Issue bug: repeated single-file edit failure が agent.fix_slice に昇格せず、worker-on short smoke が worker 未到達のまま非 zero exit する #321)commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_000938/B1qwen3.5:122b+ sidecarqwen3.5:9b65536Result artifact
B1_result.jsonshows the external failure shape:valid_run=trueexit_class=runner_nonzero_exitcommand_return_code=2changed_files=0anvil_error_count=1session_completed_count=1Source:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_000938/B1_result.json:17-35What the log shows
ANVIL_PLAN_UPDATEchecks five items but does not includecurrent-output/route.ts.B1.log:458-464remaining=1 next=src/app/api/worktrees/[id]/current-output/route.ts ... no change needed.B1.log:498-504plan item retired via checked markerplan update pipeline: all items checked → retiredplan-aware final gate: suppressing ANVIL_FINAL (premature) remaining=1 ... current-output/route.ts ... no change neededANVIL_PLAN_UPDATEthat explicitly checks the last remaining item:B1.log:516-522ANVIL_FINALsaying the audit found no remaining gaps.B1.log:525-559stagnation detected; forced_mode_active=truepre-exit repair turn injected before escape hatchstagnation escape hatch: terminating loopcompletion_kind=partial plan_items=6 plan_finished=5telemetry
plan_update_count=1B1.log:561-568Crucially:
current-output/route.tsplan_update_count=1even though the log visibly contains twoANVIL_PLAN_UPDATEblocksThis strongly indicates that the second, closure-hint-driven update was never applied.
Source-backed root cause
1. Escape hatch runs before late-response plan parsing
In the main loop, stagnation accounting and escape-hatch termination happen before the code that parses
next_token_bufferforANVIL_PLAN/ANVIL_PLAN_UPDATE.Ordering in
src/app/agentic.rs:should_allow_escape_hatch()pre-exit repairinjectionbreaktry_register_plan(&next_token_buffer)/try_update_plan(&next_token_buffer)Code:
src/app/agentic.rs:987-1045src/app/agentic.rs:1068-1126So when the latest zero-tool-call response contains a corrective
ANVIL_PLAN_UPDATEand escape hatch fires on that same turn, the loop terminates before the update pipeline ever sees the response.2. Runtime explicitly instructs the model to do the thing that gets dropped
The structured pre-exit repair message tells the model:
ANVIL_PLAN_UPDATEwith[x], thenANVIL_FINALCode:
src/app/mod.rs:846-879So the runtime’s own recovery protocol expects checked retirement on the last turn — but the loop ordering can discard exactly that response.
3. Checked retire machinery itself is not the primary failure here
The checked-first pipeline does exist and works when it actually runs:
src/app/execution_plan.rs:121-168tests/plan_item_retire.rs:16-48This B1 failure is therefore distinct from the original #301 retirement gap. The issue is not merely “cannot retire already-satisfied item”; it is “the final checked update never reaches the retirement pipeline because the loop breaks too early.”
Why this is distinct from existing issues
vs #301 / #315
Those were about retirement semantics once the update path is actually processed.
Here, the last
ANVIL_PLAN_UPDATEis present in the model output but is dropped beforetry_update_plan()runs.vs #321
#321 focused on reactive reachability from repeated single-file edit failure into
agent.fix_slice.This B1 failure is a read-only / no-change closure path after closure hint, with
fixslice_escalation_count=0and no worker-path evidence. It is a separate late-stage loop-ordering bug.vs #287
#287 was a broader
remaining=1hang / timeout family.This issue is narrower and source-backed: the agentic loop has a concrete control-flow bug where escape hatch termination precedes late-response plan update parsing.
Impact
ANVIL_PLAN_UPDATE [x], then drops the answer.Fix direction
Process
next_token_bufferbefore any escape-hatch breaktry_register_plan()/try_update_plan()and ANVIL_FINAL tracking first.Do not inject a pre-exit repair message and
breakin the same branchAdd regression coverage for the exact ordering bug
ANVIL_PLAN_UPDATEwith[x]for the last item plusANVIL_FINAL.should_allow_escape_hatch()is true on that same turn.AlreadySatisfied, final gate allows termination, and completion is notpartial.Add telemetry/assertion coverage
ANVIL_PLAN_UPDATEblocks are emitted across the run,plan_update_countshould reflect both once both responses are processed.current-output/route.tsshould produce a checked-retire log before termination.Acceptance criteria
ANVIL_PLAN_UPDATEis still parsed/applied even if escape hatch would otherwise fire that turnAlreadySatisfiedbefore loop terminationcompletion_kindis notpartialfor the reproduced B1 no-change closure pathplan_update_countreflects the late corrective update instead of staying at1