Summary
Phase2 retest after #325 on HEAD bd1255a still leaves the same pack red, but the failure shape has shifted again.
In autonomy_v2_p2_fix_slice_v1_20260410_105239:
- A1 finishes as
complete_unverified
- B1 now reaches runner-level
session_completed / command_return_code=0
pre_exit_repair_injected_count=1 and pre_exit_repair_consumed_count=1 are both observed
However, B1 still ends with:
- internal
completion_kind=partial
plan_items=26 plan_finished=24
fixslice_escalation_count=0
changed_files=0
diff_stat=no changes
So #325 fixed the "repair turn never executes" bug, but the repair-turn path can still terminate with unfinished plan items and no worker adoption.
Reproduction / observed behavior
- HEAD under test:
bd1255a (Merge pull request #326 from Kewton/feature/issue-325-repair-turn-continuation)
- result dir:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/
- run:
B1
- provider/model: Ollama
qwen3.5:122b + sidecar qwen3.5:9b
- context/max output:
65536
External result shape
B1_result.json now shows:
valid_run=true
exit_class=session_completed
command_return_code=0
changed_files=0
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1_result.json
What the log shows
- The repair turn is actually injected and consumed.
pre-exit repair turn injected; continuing for one more LLM turn
pre-exit repair turn consumed; terminating loop
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:543,569
- On that consumed repair turn, the model emits another
ANVIL_PLAN_UPDATE instead of simply closing the remaining work.
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:551-558
- Runtime applies part of that update, but also appends fresh unchecked items:
- retires
src/lib/auto-yes-poller.ts
- supersedes old
src/lib/detection/status-detector.ts and src/app/api/worktrees/[id]/current-output/route.ts items
plan update pipeline: appending items new_items=2
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:565-568
- Immediately after appending those two new pending items, the loop terminates anyway:
completion_kind=partial plan_items=26 plan_finished=24
- telemetry
fixslice_escalation_count=0
- telemetry
pre_exit_repair_injected_count=1 pre_exit_repair_consumed_count=1
Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574
Why this is distinct from #325
#325 was specifically:
- inject repair message
- but break before the model can ever consume it
That part is now fixed.
The new blocker is:
- the repair turn does run
- but its response can still expand / refresh the unfinished plan
- and the loop still terminates immediately after consuming that one response
So the runtime now has a narrower residual bug: the repair-turn response is processed, but any newly appended actionable items created by that response are never given another execution chance.
Source-backed root cause
1. pre_exit_repair_injected currently means "always break after the next response", regardless of what that response did
Main loop:
src/app/agentic.rs:1086-1092
Once pre_exit_repair_injected is true, the next response is processed and then the loop unconditionally does:
record_pre_exit_repair_consumed()
pre-exit repair turn consumed; terminating loop
break
There is no post-repair decision branch such as:
- "the plan is now cleanly finished, so exit"
- "the plan is still incomplete but repair made progress, so continue boundedly"
- "the repair response introduced fresh work, so reject that shape explicitly"
The current meaning is just "one repair response was seen, therefore stop."
2. The repair prompt is closure-oriented, but it does not constrain the response shape tightly enough
The injected repair message says:
- mutate the remaining file, or
- if no change is needed, retire it with
ANVIL_PLAN_UPDATE [x], then
- emit
ANVIL_FINAL
shell.exec is not needed
Source:
That instruction is helpful, but it does not forbid the model from emitting a mixed ANVIL_PLAN_UPDATE that contains:
- one checked item
- plus fresh unchecked items
So the runtime enters a closure-only phase, but the protocol still accepts plan-expanding responses.
3. The normal plan-update pipeline still appends unchecked items during the repair turn
Plan update handling:
src/app/execution_plan.rs:125-203
The pipeline does:
- checked-first retire
filter_unchecked_items(...)
supersede_stale_items(&new_items)
append_items(deduped) when unchecked items remain
That behavior is reasonable during ordinary replanning, but on the pre-exit repair turn it creates a contradiction:
- runtime says "last chance: close remaining work"
- runtime still accepts "here are two more unchecked actionable items"
- runtime then terminates immediately because the repair response has been consumed
In other words, repair mode currently reuses the same replan semantics as normal exploration mode, even though the loop is about to stop.
4. This makes partial inevitable whenever the repair response expands the plan
The latest B1 proves the sequence end-to-end:
- repair turn injected
- repair-turn response arrives
- one item retired, two items appended
- loop terminates on the consumed-repair branch
completion_kind=partial
That is not just "the model needed one more chance."
It is a control-flow / protocol mismatch:
- control flow: consumed repair response => stop
- protocol: consumed repair response may still create new pending work
As long as both remain true, this residual failure family will survive.
Impact
- Phase2 B1 can look externally healthier (
session_completed, exit code 0) while still failing the correctness gate internally (completion_kind=partial)
- The repair-turn mechanism is now live but still not safe as a closure protocol
- The latest same-pack suite can be misread as "fixed enough" unless the telemetry is inspected
- Worker-on runs still produce no live
agent.fix_slice / file.rewrite evidence
Fix direction
1. Make pre-exit repair a real closure mode, not ordinary replan mode
When the session enters the pre-exit repair turn, the accepted response space should be narrowed.
Options:
-
Reject unchecked plan expansion during repair mode
- If
ANVIL_PLAN_UPDATE on the repair turn contains unchecked items, do not append them
- Instead inject a strict system correction and continue boundedly, or mark the run unresolved explicitly
-
Allow progress but require another bounded turn when fresh work is introduced
- If the repair-turn response retires some items but also appends new ones, do not break immediately
- Continue with a small, explicit "repair-follow-up" budget
Either is better than the current "append and immediately terminate partial."
2. Replace unconditional consumed-repair break with a post-repair decision
After the repair-turn response is parsed and plan updates are applied:
- if the plan is cleanly finished: allow exit
- if the response introduced new pending items: do not exit as though closure succeeded
- if the response made no closure progress: terminate with an explicit unresolved reason
The key change is that "repair turn consumed" must stop being synonymous with "session should now end."
3. Add repair-turn-specific telemetry
Current counters tell us only that the repair turn was injected and consumed.
We also need to know:
- repair turn retired item count
- repair turn appended item count
- remaining items before / after repair turn
- whether exit happened with pending items newly introduced by the repair response
That would make the current failure shape first-class instead of reconstructing it from logs.
4. Add regression coverage for the exact residual branch
Test shape:
- incomplete plan near termination
- escape hatch injects repair turn
- repair-turn response contains:
- at least one
[x] item
- at least one unchecked item
- runtime applies the update
Expected:
- runtime does not append fresh unchecked items and then immediately terminate as
partial
- either the unchecked items are rejected in repair mode, or the loop continues under a bounded follow-up policy
Acceptance criteria
Desk-check: if this issue is cleared, does it achieve the original goal?
Short answer: no, not by itself. It is necessary, but not sufficient.
What clearing this issue should achieve
If the above is fixed, the latest B1 family should stop failing in this exact way:
- repair turn runs
- repair response appends fresh pending items
- loop exits immediately as
partial
That would remove the residual correctness bug left after #325 and should improve the latest Phase2 red point materially.
Why it is still not enough for the original Phase2 objective
The original Phase2 objective is not only "avoid partial closure drift."
It also requires bench-visible proof that the microtask worker path is actually reachable and integrated.
That is still not true in the latest suite:
fixslice_escalation_count=0 on both A1 and B1
- no live
agent.fix_slice hit observed
- no
file.rewrite application observed
- diff artifact remains empty
Sources:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/A1.log:355-366
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574
So the desk-check conclusion is:
That means this issue should be treated as:
- a correctness residual required to stabilize Phase2 closure behavior
- but not the only remaining gate for the original Phase2 goal
Summary
Phase2 retest after #325 on HEAD
bd1255astill leaves the same pack red, but the failure shape has shifted again.In
autonomy_v2_p2_fix_slice_v1_20260410_105239:complete_unverifiedsession_completed/command_return_code=0pre_exit_repair_injected_count=1andpre_exit_repair_consumed_count=1are both observedHowever, B1 still ends with:
completion_kind=partialplan_items=26 plan_finished=24fixslice_escalation_count=0changed_files=0diff_stat=no changesSo #325 fixed the "repair turn never executes" bug, but the repair-turn path can still terminate with unfinished plan items and no worker adoption.
Reproduction / observed behavior
bd1255a(Merge pull request #326 from Kewton/feature/issue-325-repair-turn-continuation)commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1qwen3.5:122b+ sidecarqwen3.5:9b65536External result shape
B1_result.jsonnow shows:valid_run=trueexit_class=session_completedcommand_return_code=0changed_files=0Source:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1_result.jsonWhat the log shows
pre-exit repair turn injected; continuing for one more LLM turnpre-exit repair turn consumed; terminating loopSource:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:543,569ANVIL_PLAN_UPDATEinstead of simply closing the remaining work.Source:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:551-558src/lib/auto-yes-poller.tssrc/lib/detection/status-detector.tsandsrc/app/api/worktrees/[id]/current-output/route.tsitemsplan update pipeline: appending items new_items=2Source:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:565-568completion_kind=partial plan_items=26 plan_finished=24fixslice_escalation_count=0pre_exit_repair_injected_count=1 pre_exit_repair_consumed_count=1Source:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574Why this is distinct from #325
#325 was specifically:
That part is now fixed.
The new blocker is:
So the runtime now has a narrower residual bug: the repair-turn response is processed, but any newly appended actionable items created by that response are never given another execution chance.
Source-backed root cause
1.
pre_exit_repair_injectedcurrently means "always break after the next response", regardless of what that response didMain loop:
src/app/agentic.rs:1086-1092Once
pre_exit_repair_injectedis true, the next response is processed and then the loop unconditionally does:record_pre_exit_repair_consumed()pre-exit repair turn consumed; terminating loopbreakThere is no post-repair decision branch such as:
The current meaning is just "one repair response was seen, therefore stop."
2. The repair prompt is closure-oriented, but it does not constrain the response shape tightly enough
The injected repair message says:
ANVIL_PLAN_UPDATE [x], thenANVIL_FINALshell.execis not neededSource:
src/app/mod.rs:850-879That instruction is helpful, but it does not forbid the model from emitting a mixed
ANVIL_PLAN_UPDATEthat contains:So the runtime enters a closure-only phase, but the protocol still accepts plan-expanding responses.
3. The normal plan-update pipeline still appends unchecked items during the repair turn
Plan update handling:
src/app/execution_plan.rs:125-203The pipeline does:
filter_unchecked_items(...)supersede_stale_items(&new_items)append_items(deduped)when unchecked items remainThat behavior is reasonable during ordinary replanning, but on the pre-exit repair turn it creates a contradiction:
In other words, repair mode currently reuses the same replan semantics as normal exploration mode, even though the loop is about to stop.
4. This makes
partialinevitable whenever the repair response expands the planThe latest B1 proves the sequence end-to-end:
completion_kind=partialThat is not just "the model needed one more chance."
It is a control-flow / protocol mismatch:
As long as both remain true, this residual failure family will survive.
Impact
session_completed, exit code 0) while still failing the correctness gate internally (completion_kind=partial)agent.fix_slice/file.rewriteevidenceFix direction
1. Make pre-exit repair a real closure mode, not ordinary replan mode
When the session enters the pre-exit repair turn, the accepted response space should be narrowed.
Options:
Reject unchecked plan expansion during repair mode
ANVIL_PLAN_UPDATEon the repair turn contains unchecked items, do not append themAllow progress but require another bounded turn when fresh work is introduced
Either is better than the current "append and immediately terminate partial."
2. Replace unconditional consumed-repair break with a post-repair decision
After the repair-turn response is parsed and plan updates are applied:
The key change is that "repair turn consumed" must stop being synonymous with "session should now end."
3. Add repair-turn-specific telemetry
Current counters tell us only that the repair turn was injected and consumed.
We also need to know:
That would make the current failure shape first-class instead of reconstructing it from logs.
4. Add regression coverage for the exact residual branch
Test shape:
[x]itemExpected:
partialAcceptance criteria
partialANVIL_PLAN_UPDATEitems (reject, bounded continue, or equivalent)autonomy_v2_p2_fix_slice_v1_20260410_105239no longer fails for this specific residual repair-turn behaviorDesk-check: if this issue is cleared, does it achieve the original goal?
Short answer: no, not by itself. It is necessary, but not sufficient.
What clearing this issue should achieve
If the above is fixed, the latest B1 family should stop failing in this exact way:
partialThat would remove the residual correctness bug left after #325 and should improve the latest Phase2 red point materially.
Why it is still not enough for the original Phase2 objective
The original Phase2 objective is not only "avoid partial closure drift."
It also requires bench-visible proof that the microtask worker path is actually reachable and integrated.
That is still not true in the latest suite:
fixslice_escalation_count=0on both A1 and B1agent.fix_slicehit observedfile.rewriteapplication observedSources:
commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/A1.log:355-366commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574So the desk-check conclusion is:
That means this issue should be treated as: