Phase2: late-stage closure loop keeps appending last plan item after Issue334

## Summary

Issue334 後の Phase2 short smoke では、worker path 自体は live で観測された。

- `worker_observed=true`
- `mutation_observed=true`
- `fixslice_escalation_stagnation_count=1`

一方で A1 は最後の 1 item を閉じられず、late-stage closure loop のまま `partial` で停止した。

- `completion_kind=partial`
- `accepted_final_count=0`
- `final_suppressed_with_remaining_targets_count=4`
- `plan_update_count=9`
- `sync_from_touched_files_count=0`

つまり、Issue334 により `read_heavy -> worker escalation -> mutation` は達成できたが、**最後の未完了 item を clean に閉じる runtime closure / termination path がまだ壊れている**。

今回の blocking defect は benchmark harness ではなく `Anvil` runtime 側にある。

## Reproduction / observed behavior

Run:

- pack: `autonomy-v2-p2-fix-slice-v1`
- expectation: `requires_worker_observation`
- results dir: `commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_173634/`
- provider/model: Ollama `qwen3.5:122b` + sidecar `qwen3.5:9b`

A1 telemetry:

- `worker_observed=true`
- `mutation_observed=true`
- `repair_turn_observed=false`
- `completion_kind=partial`
- `plan_update_count=9`
- `sync_from_touched_files_count=0`

A1 log highlights:

1. `fixslice_escalation_triggered (read_heavy)`
2. `agent.fix_slice` invoked multiple times
3. `file.edit success` on `src/lib/detection/prompt-detector.ts`
4. `plan item completed (all target_files mutated)` for the `prompt-detector.ts` item
5. `plan-aware final gate: suppressing ANVIL_FINAL (premature)`
6. repeated `late-stage closure mode: remaining=1, injecting closure hint`
7. repeated follow-up `ANVIL_PLAN` / `ANVIL_PLAN_UPDATE` that keep reintroducing unchecked `tests/unit/prompt-detector.test.ts` work
8. no accepted final before run interruption

Observed runtime path:

1. read-heavy stagnation
2. worker escalation succeeds
3. first mutation succeeds
4. one plan item is completed
5. remaining becomes 1
6. closure mode injects hints
7. follow-up plan updates append a fresh unchecked last item again
8. final gate keeps suppressing ANVIL_FINAL
9. session remains `partial`

## Root cause

### 1. Late-stage closure mode is advisory only; it does not block unchecked plan expansion

`remaining==1` currently activates `build_late_stage_closure_hint()`, but that path only injects stronger guidance.
It does **not** enforce any structural restriction on follow-up `ANVIL_PLAN` / `ANVIL_PLAN_UPDATE` blocks.

By contrast, pre-exit repair closure mode (`repair_closure_active`) explicitly rejects unchecked appended items.

This asymmetry matters:

- normal late-stage closure mode: unchecked replan/update expansion is still accepted
- repair closure mode: unchecked replan/update expansion is rejected

So once the session reaches `remaining=1`, the model can keep proposing a corrected unchecked final item, and runtime keeps accepting it.

### 2. The plan update pipeline can churn the same last target indefinitely

The shared update pipeline does:

- checked-first retire
- supersede stale items
- deduplicate new items
- append deduped items

In the current run, this interacts badly with the final item churn:

- the existing unfinished last item is superseded
- the replacement item survives dedup
- the new unchecked item is appended
- `remaining` stays 1, but the item identity is refreshed

This is enabled by the current semantics where `Superseded` items are excluded from dedup matching so corrected replacements can survive. That behavior is correct in general, but it becomes pathological in late-stage closure mode when the replacement is effectively the same last target again.

### 3. The touched-file rescue path is not closing late-appended final items

`check_plan_final_gate_inner()` calls `sync_from_touched_files()` before evaluating the final gate.
That rescue path exists specifically to mark items done when the file was already touched but the plan state lagged behind.

However, in the failing A1 run:

- `sync_from_touched_files_count=0`
- repeated final suppressions continue

So the late-appended final item is not being structurally reconciled against already-observed file changes.

Whether the exact local cause is:

- the last item truly was never mutated, or
- the item was reintroduced after mutation and should have been retired/deduped/reconciled,

the runtime defect is the same: **late-stage closure allows endless last-item churn without a hard closure rule**.

## Why this belongs to Anvil

`localllm-test` is not creating this loop.
The benchmark runner only:

- launches the suite
- reads telemetry/logs/artifacts
- classifies the run result

The loop is already visible inside the raw Anvil runtime behavior:

- repeated `late-stage closure mode: remaining=1`
- repeated `ANVIL_FINAL` suppression
- repeated plan updates
- no accepted final

So this issue belongs to `Anvil` runtime plan/closure logic, not harness-side wiring.

## Concrete problem areas

Primary hotspots:

- `src/app/execution_plan.rs`
  - `apply_plan_update_pipeline()`
  - `check_plan_final_gate_inner()`
  - `inject_plan_turn_guidance()`
- `src/contracts/mod.rs`
  - `deduplicate_new_items()`
  - `supersede_stale_items()`
  - `sync_from_touched_files()`
- `src/app/agentic.rs`
  - follow-up `ANVIL_PLAN` / `ANVIL_PLAN_UPDATE` handling around replan / update application
  - pre-exit repair closure handling, for comparison with late-stage closure semantics

## Fix direction

### 1. Add a structural closure guard for `remaining==1`

When late-stage closure mode is active, runtime should stop treating unchecked follow-up plan expansion as a normal replan path.

Candidate policy:

- if `remaining==1` and the new unchecked items only restate / refine the current last target, reject them instead of appending
- or, activate the same unchecked-item rejection policy used by `repair_closure_active`
- or, auto-upgrade normal closure mode into a stricter closure state after the first final suppression at `remaining==1`

The important property is: **late-stage closure must become structurally narrowing, not advisory-only**.

### 2. Reconcile late-appended items against known file state before append/final-gate churn

If a newly proposed last item matches:

- an already touched file, or
- an already mutated target in the current plan lineage,

it should be retired or deduped instead of appended as a fresh actionable blocker.

Possible implementations:

- before `append_items()`, reconcile deduped items against `working_memory.touched_files`
- treat same-target replacements in closure mode as already satisfied when the target was already mutated
- add a closure-mode-specific dedup rule that does not ignore superseded ancestry for the final remaining target

### 3. Keep `Superseded`-skip dedup for general replans, but narrow it in closure mode

The current `Superseded` skip exists for a good reason, so removing it globally is risky.

A safer approach is to scope the special handling:

- preserve current semantics during ordinary replans
- tighten semantics only when `remaining==1` or closure mode is active

That avoids regressing the earlier fixes while stopping the last-item churn.

### 4. Add regression coverage for the exact A1 failure shape

Needed regression:

- read-heavy stagnation reaches worker escalation
- one target mutates successfully
- plan reaches `remaining=1`
- follow-up `ANVIL_PLAN` / `ANVIL_PLAN_UPDATE` restates the same final target
- runtime does **not** append indefinite fresh blockers
- session can either:
  - finish cleanly with accepted final, or
  - enter strict repair closure that rejects unchecked expansion and terminates deterministically

## Candidate acceptance criteria

- [ ] A `remaining==1` late-stage closure path cannot append the same effective final target indefinitely
- [ ] unchecked follow-up `ANVIL_PLAN` / `ANVIL_PLAN_UPDATE` items are rejected or structurally reconciled when closure mode is active
- [ ] same-target final-item replacements are deduped/retired when prior touched/mutated evidence already exists
- [ ] `sync_from_touched_files()` or equivalent closure reconciliation advances the final item in the late-stage churn case
- [ ] a regression test reproduces the A1 pattern and proves deterministic closure
- [ ] Phase2 retest no longer stalls in `completion_kind=partial` after worker-mediated mutation

## Desk-check: if this issue is cleared, does it achieve the original goal?

**Likely yes, with rerun confirmation still required.**

Issue334 already moved the system past the original worker-path blocker:

- worker path is now live-observed
- mutation is now live-observed

The remaining blocker seen in the latest Phase2 run is closure stability.
Therefore, if this issue is fixed as described, the previously missing piece is removed:

- the session should be able to convert worker-mediated progress into a clean completion rather than looping at `remaining=1`

That means clearing this issue should make the original Phase2 objective practically reachable:

- live worker observation
- real mutation evidence
- valid completion instead of `partial`

Strictly speaking, final proof still requires retest evidence. But unlike the earlier issues, this now appears to be the last major runtime blocker on the critical path.

## Issue quality check / brush-up

The main risk in this issue is blaming only prompt/model behavior. That would be too weak.
The stronger and more actionable framing is:

- the model may emit noisy follow-up replans,
- but runtime late-stage closure currently has no structural rule to stop that noise from re-creating the last blocker,
- therefore the bug is in runtime closure semantics, not merely in model quality.

This issue is intentionally framed around that runtime contract so the fix can be validated by regression tests and not by hoping for a luckier model sample.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase2: late-stage closure loop keeps appending last plan item after Issue334 #336

Summary

Reproduction / observed behavior

Root cause

1. Late-stage closure mode is advisory only; it does not block unchecked plan expansion

2. The plan update pipeline can churn the same last target indefinitely

3. The touched-file rescue path is not closing late-appended final items

Why this belongs to Anvil

Concrete problem areas

Fix direction

1. Add a structural closure guard for `remaining==1`

2. Reconcile late-appended items against known file state before append/final-gate churn

3. Keep `Superseded`-skip dedup for general replans, but narrow it in closure mode

4. Add regression coverage for the exact A1 failure shape

Candidate acceptance criteria

Desk-check: if this issue is cleared, does it achieve the original goal?

Issue quality check / brush-up

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Phase2: late-stage closure loop keeps appending last plan item after Issue334 #336

Description

Summary

Reproduction / observed behavior

Root cause

1. Late-stage closure mode is advisory only; it does not block unchecked plan expansion

2. The plan update pipeline can churn the same last target indefinitely

3. The touched-file rescue path is not closing late-appended final items

Why this belongs to Anvil

Concrete problem areas

Fix direction

1. Add a structural closure guard for remaining==1

2. Reconcile late-appended items against known file state before append/final-gate churn

3. Keep Superseded-skip dedup for general replans, but narrow it in closure mode

4. Add regression coverage for the exact A1 failure shape

Candidate acceptance criteria

Desk-check: if this issue is cleared, does it achieve the original goal?

Issue quality check / brush-up

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Add a structural closure guard for `remaining==1`

3. Keep `Superseded`-skip dedup for general replans, but narrow it in closure mode