Skip to content

Structured workflow turn interrupted mid-stream recovers as skipped instead of completing with output #1727

@threepointone

Description

@threepointone

Summary

When a structured workflow turn (a ThinkWorkflow step driven by step.prompt({ output })) is interrupted mid-stream (DO eviction / deploy churn / SIGKILL), chat recovery reconciles it as skipped rather than re-running it to completion. The workflow then surfaces a ThinkPromptSkippedError (packages/think/src/workflows.ts:67, :231) instead of completing with the validated structured output.

This was discovered while adding e2e recovery coverage. The new e2e (packages/think/src/e2e-tests/workflow-recovery.test.ts) deliberately locks in only the guarantee that holds today — the workflow is unblocked (terminal, not hung) and the notification drain replays the submission's terminal status after restart — and documents this gap inline (see the NOTE (deferred) at workflow-recovery.test.ts:317-324).

Expected vs actual

  • Expected: a structured workflow turn interrupted mid-stream is recovered (continued/retried) and the workflow step.prompt resolves with the structured output, same as a non-interrupted run.
  • Actual: recovery resolves the submission as skipped; the workflow step rejects with ThinkPromptSkippedError and the workflow reaches a terminal (non-complete) state.

Root cause (initial analysis)

Two interacting behaviors in packages/think/src/think.ts:

  1. continueLastTurn skips on an interrupted assistant leaf. When the persisted leaf is a mid-stream (interrupted) assistant message, the chat-recovery continuation treats the turn as already-progressed and skips rather than re-running it. For a free-form chat turn that's reasonable (don't double-emit), but for a structured turn it means the turn never re-runs to produce the schema-shaped output.
  2. _completeRecoveredSubmission carries no structured output (packages/think/src/think.ts:10024, and its call sites ~9290/9572/9805/9899/9918/9944/10156/10181). A recovered submission has no path to reconstruct/re-derive the structured output, so even when reconciled it cannot satisfy the step.prompt({ output }) contract.

Why it's deferred

A correct fix touches the recovery continuation path fairly broadly (distinguishing structured turns so they re-run to produce output, and threading the structured output through _completeRecoveredSubmission), which is higher-risk than the surrounding e2e work. Filing as a follow-up rather than bundling it into the test-coverage change.

Repro

packages/think/src/e2e-tests/workflow-recovery.test.ts — the "recovery" case kills a structured workflow turn mid-stream and currently observes a terminal-but-non-complete workflow. Tightening that assertion to require status === "complete" with the validated output would reproduce the gap.

Acceptance

  • A structured workflow turn interrupted mid-stream recovers and completes with the validated structured output.
  • workflow-recovery.test.ts is tightened to assert complete + correct output (removing the NOTE (deferred)).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions