bug: pre-exit repair turn 応答が新規 pending plan を append しても loop が即 terminate し Phase2 B1 が partial で閉じる

## Summary

Phase2 retest after #325 on HEAD `bd1255a` still leaves the same pack red, but the failure shape has shifted again.

In `autonomy_v2_p2_fix_slice_v1_20260410_105239`:

- A1 finishes as `complete_unverified`
- B1 now reaches runner-level `session_completed` / `command_return_code=0`
- `pre_exit_repair_injected_count=1` and `pre_exit_repair_consumed_count=1` are both observed

However, B1 still ends with:

- internal `completion_kind=partial`
- `plan_items=26 plan_finished=24`
- `fixslice_escalation_count=0`
- `changed_files=0`
- `diff_stat=no changes`

So #325 fixed the "repair turn never executes" bug, but the repair-turn path can still terminate with unfinished plan items and no worker adoption.

## Reproduction / observed behavior

- HEAD under test: `bd1255a` (`Merge pull request #326 from Kewton/feature/issue-325-repair-turn-continuation`)
- result dir: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/`
- run: `B1`
- provider/model: Ollama `qwen3.5:122b` + sidecar `qwen3.5:9b`
- context/max output: `65536`

### External result shape

`B1_result.json` now shows:

- `valid_run=true`
- `exit_class=session_completed`
- `command_return_code=0`
- `changed_files=0`

Source: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1_result.json`

### What the log shows

1. The repair turn is actually injected and consumed.

- `pre-exit repair turn injected; continuing for one more LLM turn`
- `pre-exit repair turn consumed; terminating loop`

Source: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:543,569`

2. On that consumed repair turn, the model emits another `ANVIL_PLAN_UPDATE` instead of simply closing the remaining work.

Source: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:551-558`

3. Runtime applies part of that update, but also appends fresh unchecked items:

- retires `src/lib/auto-yes-poller.ts`
- supersedes old `src/lib/detection/status-detector.ts` and `src/app/api/worktrees/[id]/current-output/route.ts` items
- `plan update pipeline: appending items new_items=2`

Source: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:565-568`

4. Immediately after appending those two new pending items, the loop terminates anyway:

- `completion_kind=partial plan_items=26 plan_finished=24`
- telemetry `fixslice_escalation_count=0`
- telemetry `pre_exit_repair_injected_count=1 pre_exit_repair_consumed_count=1`

Source: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574`

## Why this is distinct from #325

#325 was specifically:

- inject repair message
- but break before the model can ever consume it

That part is now fixed.

The new blocker is:

- the repair turn **does** run
- but its response can still expand / refresh the unfinished plan
- and the loop still terminates immediately after consuming that one response

So the runtime now has a narrower residual bug: **the repair-turn response is processed, but any newly appended actionable items created by that response are never given another execution chance.**

## Source-backed root cause

### 1. `pre_exit_repair_injected` currently means "always break after the next response", regardless of what that response did

Main loop:

- `src/app/agentic.rs:1086-1092`

Once `pre_exit_repair_injected` is true, the next response is processed and then the loop unconditionally does:

- `record_pre_exit_repair_consumed()`
- `pre-exit repair turn consumed; terminating loop`
- `break`

There is no post-repair decision branch such as:

- "the plan is now cleanly finished, so exit"
- "the plan is still incomplete but repair made progress, so continue boundedly"
- "the repair response introduced fresh work, so reject that shape explicitly"

The current meaning is just "one repair response was seen, therefore stop."

### 2. The repair prompt is closure-oriented, but it does not constrain the response shape tightly enough

The injected repair message says:

- mutate the remaining file, or
- if no change is needed, retire it with `ANVIL_PLAN_UPDATE [x]`, then
- emit `ANVIL_FINAL`
- `shell.exec` is not needed

Source:

- `src/app/mod.rs:850-879`

That instruction is helpful, but it does **not** forbid the model from emitting a mixed `ANVIL_PLAN_UPDATE` that contains:

- one checked item
- plus fresh unchecked items

So the runtime enters a closure-only phase, but the protocol still accepts plan-expanding responses.

### 3. The normal plan-update pipeline still appends unchecked items during the repair turn

Plan update handling:

- `src/app/execution_plan.rs:125-203`

The pipeline does:

1. checked-first retire
2. `filter_unchecked_items(...)`
3. `supersede_stale_items(&new_items)`
4. `append_items(deduped)` when unchecked items remain

That behavior is reasonable during ordinary replanning, but on the pre-exit repair turn it creates a contradiction:

- runtime says "last chance: close remaining work"
- runtime still accepts "here are two more unchecked actionable items"
- runtime then terminates immediately because the repair response has been consumed

In other words, **repair mode currently reuses the same replan semantics as normal exploration mode, even though the loop is about to stop.**

### 4. This makes `partial` inevitable whenever the repair response expands the plan

The latest B1 proves the sequence end-to-end:

1. repair turn injected
2. repair-turn response arrives
3. one item retired, two items appended
4. loop terminates on the consumed-repair branch
5. `completion_kind=partial`

That is not just "the model needed one more chance."

It is a control-flow / protocol mismatch:

- **control flow**: consumed repair response => stop
- **protocol**: consumed repair response may still create new pending work

As long as both remain true, this residual failure family will survive.

## Impact

- Phase2 B1 can look externally healthier (`session_completed`, exit code 0) while still failing the correctness gate internally (`completion_kind=partial`)
- The repair-turn mechanism is now live but still not safe as a closure protocol
- The latest same-pack suite can be misread as "fixed enough" unless the telemetry is inspected
- Worker-on runs still produce no live `agent.fix_slice` / `file.rewrite` evidence

## Fix direction

### 1. Make pre-exit repair a real closure mode, not ordinary replan mode

When the session enters the pre-exit repair turn, the accepted response space should be narrowed.

Options:

1. **Reject unchecked plan expansion during repair mode**
   - If `ANVIL_PLAN_UPDATE` on the repair turn contains unchecked items, do not append them
   - Instead inject a strict system correction and continue boundedly, or mark the run unresolved explicitly

2. **Allow progress but require another bounded turn when fresh work is introduced**
   - If the repair-turn response retires some items but also appends new ones, do not break immediately
   - Continue with a small, explicit "repair-follow-up" budget

Either is better than the current "append and immediately terminate partial."

### 2. Replace unconditional consumed-repair break with a post-repair decision

After the repair-turn response is parsed and plan updates are applied:

- if the plan is cleanly finished: allow exit
- if the response introduced new pending items: do not exit as though closure succeeded
- if the response made no closure progress: terminate with an explicit unresolved reason

The key change is that "repair turn consumed" must stop being synonymous with "session should now end."

### 3. Add repair-turn-specific telemetry

Current counters tell us only that the repair turn was injected and consumed.

We also need to know:

- repair turn retired item count
- repair turn appended item count
- remaining items before / after repair turn
- whether exit happened with pending items newly introduced by the repair response

That would make the current failure shape first-class instead of reconstructing it from logs.

### 4. Add regression coverage for the exact residual branch

Test shape:

1. incomplete plan near termination
2. escape hatch injects repair turn
3. repair-turn response contains:
   - at least one `[x]` item
   - at least one unchecked item
4. runtime applies the update

Expected:

- runtime does **not** append fresh unchecked items and then immediately terminate as `partial`
- either the unchecked items are rejected in repair mode, or the loop continues under a bounded follow-up policy

## Acceptance criteria

- [ ] A consumed pre-exit repair turn cannot append new unfinished plan items and then immediately terminate as `partial`
- [ ] Repair mode has an explicit policy for unchecked `ANVIL_PLAN_UPDATE` items (reject, bounded continue, or equivalent)
- [ ] Post-repair termination is decided from the resulting plan state, not just from the fact that one repair response was consumed
- [ ] Telemetry exposes whether the repair turn appended new work and how many items remained before/after it
- [ ] A regression test covers the "repair turn appends 2 new items then exits partial" path
- [ ] Re-running `autonomy_v2_p2_fix_slice_v1_20260410_105239` no longer fails for this specific residual repair-turn behavior

## Desk-check: if this issue is cleared, does it achieve the original goal?

**Short answer: no, not by itself. It is necessary, but not sufficient.**

### What clearing this issue should achieve

If the above is fixed, the latest B1 family should stop failing in this exact way:

- repair turn runs
- repair response appends fresh pending items
- loop exits immediately as `partial`

That would remove the residual correctness bug left after #325 and should improve the latest Phase2 red point materially.

### Why it is still not enough for the original Phase2 objective

The original Phase2 objective is not only "avoid partial closure drift."
It also requires bench-visible proof that the microtask worker path is actually reachable and integrated.

That is still not true in the latest suite:

- `fixslice_escalation_count=0` on both A1 and B1
- no live `agent.fix_slice` hit observed
- no `file.rewrite` application observed
- diff artifact remains empty

Sources:

- `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/A1.log:355-366`
- `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_105239/B1.log:569-574`

So the desk-check conclusion is:

- **Yes**: clearing this issue should directly remove the latest Issue325-aftershock failure family
- **No**: clearing this issue alone is still not enough to claim Phase2 is complete, because worker reachability / live adoption is still unproven (tracked separately by #321)

That means this issue should be treated as:

- a correctness residual required to stabilize Phase2 closure behavior
- but not the only remaining gate for the original Phase2 goal


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: pre-exit repair turn 応答が新規 pending plan を append しても loop が即 terminate し Phase2 B1 が partial で閉じる #327

Summary

Reproduction / observed behavior

External result shape

What the log shows

Why this is distinct from #325

Source-backed root cause

1. `pre_exit_repair_injected` currently means "always break after the next response", regardless of what that response did

2. The repair prompt is closure-oriented, but it does not constrain the response shape tightly enough

3. The normal plan-update pipeline still appends unchecked items during the repair turn

4. This makes `partial` inevitable whenever the repair response expands the plan

Impact

Fix direction

1. Make pre-exit repair a real closure mode, not ordinary replan mode

2. Replace unconditional consumed-repair break with a post-repair decision

3. Add repair-turn-specific telemetry

4. Add regression coverage for the exact residual branch

Acceptance criteria

Desk-check: if this issue is cleared, does it achieve the original goal?

What clearing this issue should achieve

Why it is still not enough for the original Phase2 objective

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bug: pre-exit repair turn 応答が新規 pending plan を append しても loop が即 terminate し Phase2 B1 が partial で閉じる #327

Description

Summary

Reproduction / observed behavior

External result shape

What the log shows

Why this is distinct from #325

Source-backed root cause

1. pre_exit_repair_injected currently means "always break after the next response", regardless of what that response did

2. The repair prompt is closure-oriented, but it does not constrain the response shape tightly enough

3. The normal plan-update pipeline still appends unchecked items during the repair turn

4. This makes partial inevitable whenever the repair response expands the plan

Impact

Fix direction

1. Make pre-exit repair a real closure mode, not ordinary replan mode

2. Replace unconditional consumed-repair break with a post-repair decision

3. Add repair-turn-specific telemetry

4. Add regression coverage for the exact residual branch

Acceptance criteria

Desk-check: if this issue is cleared, does it achieve the original goal?

What clearing this issue should achieve

Why it is still not enough for the original Phase2 objective

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `pre_exit_repair_injected` currently means "always break after the next response", regardless of what that response did

4. This makes `partial` inevitable whenever the repair response expands the plan