Skip to content

bug: stagnation escape hatch が最後の ANVIL_PLAN_UPDATE を処理する前に break し、checked retire が捨てられる #323

@Kewton

Description

@Kewton

Summary

Phase2 retest after #321 still has a red point in autonomy_v2_p2_fix_slice_v1_20260410_000938 B1, but the blocker is now more specific than "already-satisfied item retire is missing".

The last closure-hint response contains a correct ANVIL_PLAN_UPDATE that checks the final remaining file-backed item (src/app/api/worktrees/[id]/current-output/route.ts), but the agentic loop hits the stagnation escape hatch and breaks before parsing/applying that response.

As a result, the checked retire for the last item is dropped, remaining=1 persists, and the run ends as partial + runner_nonzero_exit even though the model emitted the intended no-change closure.

Reproduction / observed behavior

Result artifact

B1_result.json shows the external failure shape:

  • valid_run=true
  • exit_class=runner_nonzero_exit
  • command_return_code=2
  • changed_files=0
  • anvil_error_count=1
  • session_completed_count=1

Source: commandindextest/results/autonomy_v2_p2_fix_slice_v1_20260410_000938/B1_result.json:17-35

What the log shows

  1. The first ANVIL_PLAN_UPDATE checks five items but does not include current-output/route.ts.
  • B1.log:458-464
  1. Runtime retires those five items and immediately reports remaining=1 next=src/app/api/worktrees/[id]/current-output/route.ts ... no change needed.
  • B1.log:498-504
    • five plan item retired via checked marker
    • plan update pipeline: all items checked → retired
    • plan-aware final gate: suppressing ANVIL_FINAL (premature) remaining=1 ... current-output/route.ts ... no change needed
  1. After the closure hint, the model emits a second ANVIL_PLAN_UPDATE that explicitly checks the last remaining item:
- [x] src/app/api/worktrees/[id]/current-output/route.ts: uses detectSessionStatus() which uses buildDetectPromptOptions() - no change needed
  • B1.log:516-522
  1. That same response also emits ANVIL_FINAL saying the audit found no remaining gaps.
  • B1.log:525-559
  1. But immediately after that response, the loop logs:
  • stagnation detected; forced_mode_active=true

  • pre-exit repair turn injected before escape hatch

  • stagnation escape hatch: terminating loop

  • completion_kind=partial plan_items=6 plan_finished=5

  • telemetry plan_update_count=1

  • B1.log:561-568

Crucially:

  • there is no retire log for current-output/route.ts
  • telemetry still says plan_update_count=1 even though the log visibly contains two ANVIL_PLAN_UPDATE blocks

This strongly indicates that the second, closure-hint-driven update was never applied.

Source-backed root cause

1. Escape hatch runs before late-response plan parsing

In the main loop, stagnation accounting and escape-hatch termination happen before the code that parses next_token_buffer for ANVIL_PLAN / ANVIL_PLAN_UPDATE.

Ordering in src/app/agentic.rs:

  1. end-turn stagnation / should_allow_escape_hatch()
  2. optional pre-exit repair injection
  3. break
  4. only after that, try_register_plan(&next_token_buffer) / try_update_plan(&next_token_buffer)

Code:

  • src/app/agentic.rs:987-1045
  • src/app/agentic.rs:1068-1126

So when the latest zero-tool-call response contains a corrective ANVIL_PLAN_UPDATE and escape hatch fires on that same turn, the loop terminates before the update pipeline ever sees the response.

2. Runtime explicitly instructs the model to do the thing that gets dropped

The structured pre-exit repair message tells the model:

  1. mutate the remaining file, or
  2. if no change is needed, emit ANVIL_PLAN_UPDATE with [x], then
  3. emit ANVIL_FINAL

Code:

  • src/app/mod.rs:846-879

So the runtime’s own recovery protocol expects checked retirement on the last turn — but the loop ordering can discard exactly that response.

3. Checked retire machinery itself is not the primary failure here

The checked-first pipeline does exist and works when it actually runs:

  • src/app/execution_plan.rs:121-168
  • tests/plan_item_retire.rs:16-48

This B1 failure is therefore distinct from the original #301 retirement gap. The issue is not merely “cannot retire already-satisfied item”; it is “the final checked update never reaches the retirement pipeline because the loop breaks too early.”

Why this is distinct from existing issues

vs #301 / #315

Those were about retirement semantics once the update path is actually processed.

Here, the last ANVIL_PLAN_UPDATE is present in the model output but is dropped before try_update_plan() runs.

vs #321

#321 focused on reactive reachability from repeated single-file edit failure into agent.fix_slice.

This B1 failure is a read-only / no-change closure path after closure hint, with fixslice_escalation_count=0 and no worker-path evidence. It is a separate late-stage loop-ordering bug.

vs #287

#287 was a broader remaining=1 hang / timeout family.

This issue is narrower and source-backed: the agentic loop has a concrete control-flow bug where escape hatch termination precedes late-response plan update parsing.

Impact

Fix direction

  1. Process next_token_buffer before any escape-hatch break

    • Apply try_register_plan() / try_update_plan() and ANVIL_FINAL tracking first.
    • Re-evaluate completion / final gate after those updates.
  2. Do not inject a pre-exit repair message and break in the same branch

    • If a repair message is injected, the loop should continue at least one more iteration so the message can actually be sent and its response processed.
    • Otherwise the injected repair message is dead-on-arrival.
  3. Add regression coverage for the exact ordering bug

    • Setup: one unfinished plan item remains.
    • Model response has zero tool calls, emits ANVIL_PLAN_UPDATE with [x] for the last item plus ANVIL_FINAL.
    • should_allow_escape_hatch() is true on that same turn.
    • Expected: the update is applied, the item becomes AlreadySatisfied, final gate allows termination, and completion is not partial.
  4. Add telemetry/assertion coverage

    • When two ANVIL_PLAN_UPDATE blocks are emitted across the run, plan_update_count should reflect both once both responses are processed.
    • For this repro shape, current-output/route.ts should produce a checked-retire log before termination.

Acceptance criteria

  • A late zero-tool-call response containing ANVIL_PLAN_UPDATE is still parsed/applied even if escape hatch would otherwise fire that turn
  • The final checked item can reach AlreadySatisfied before loop termination
  • completion_kind is not partial for the reproduced B1 no-change closure path
  • plan_update_count reflects the late corrective update instead of staying at 1
  • A regression test covers the "escape hatch fires on the same turn as the final checked update" path

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions