Summary
When a structured workflow turn (a ThinkWorkflow step driven by step.prompt({ output })) is interrupted mid-stream (DO eviction / deploy churn / SIGKILL), chat recovery reconciles it as skipped rather than re-running it to completion. The workflow then surfaces a ThinkPromptSkippedError (packages/think/src/workflows.ts:67, :231) instead of completing with the validated structured output.
This was discovered while adding e2e recovery coverage. The new e2e (packages/think/src/e2e-tests/workflow-recovery.test.ts) deliberately locks in only the guarantee that holds today — the workflow is unblocked (terminal, not hung) and the notification drain replays the submission's terminal status after restart — and documents this gap inline (see the NOTE (deferred) at workflow-recovery.test.ts:317-324).
Expected vs actual
- Expected: a structured workflow turn interrupted mid-stream is recovered (continued/retried) and the workflow
step.prompt resolves with the structured output, same as a non-interrupted run.
- Actual: recovery resolves the submission as
skipped; the workflow step rejects with ThinkPromptSkippedError and the workflow reaches a terminal (non-complete) state.
Root cause (initial analysis)
Two interacting behaviors in packages/think/src/think.ts:
continueLastTurn skips on an interrupted assistant leaf. When the persisted leaf is a mid-stream (interrupted) assistant message, the chat-recovery continuation treats the turn as already-progressed and skips rather than re-running it. For a free-form chat turn that's reasonable (don't double-emit), but for a structured turn it means the turn never re-runs to produce the schema-shaped output.
_completeRecoveredSubmission carries no structured output (packages/think/src/think.ts:10024, and its call sites ~9290/9572/9805/9899/9918/9944/10156/10181). A recovered submission has no path to reconstruct/re-derive the structured output, so even when reconciled it cannot satisfy the step.prompt({ output }) contract.
Why it's deferred
A correct fix touches the recovery continuation path fairly broadly (distinguishing structured turns so they re-run to produce output, and threading the structured output through _completeRecoveredSubmission), which is higher-risk than the surrounding e2e work. Filing as a follow-up rather than bundling it into the test-coverage change.
Repro
packages/think/src/e2e-tests/workflow-recovery.test.ts — the "recovery" case kills a structured workflow turn mid-stream and currently observes a terminal-but-non-complete workflow. Tightening that assertion to require status === "complete" with the validated output would reproduce the gap.
Acceptance
- A structured workflow turn interrupted mid-stream recovers and completes with the validated structured output.
workflow-recovery.test.ts is tightened to assert complete + correct output (removing the NOTE (deferred)).
Summary
When a structured workflow turn (a
ThinkWorkflowstep driven bystep.prompt({ output })) is interrupted mid-stream (DO eviction / deploy churn / SIGKILL), chat recovery reconciles it asskippedrather than re-running it to completion. The workflow then surfaces aThinkPromptSkippedError(packages/think/src/workflows.ts:67,:231) instead of completing with the validated structured output.This was discovered while adding e2e recovery coverage. The new e2e (
packages/think/src/e2e-tests/workflow-recovery.test.ts) deliberately locks in only the guarantee that holds today — the workflow is unblocked (terminal, not hung) and the notification drain replays the submission's terminal status after restart — and documents this gap inline (see theNOTE (deferred)atworkflow-recovery.test.ts:317-324).Expected vs actual
step.promptresolves with the structured output, same as a non-interrupted run.skipped; the workflow step rejects withThinkPromptSkippedErrorand the workflow reaches a terminal (non-complete) state.Root cause (initial analysis)
Two interacting behaviors in
packages/think/src/think.ts:continueLastTurnskips on an interrupted assistant leaf. When the persisted leaf is a mid-stream (interrupted) assistant message, the chat-recovery continuation treats the turn as already-progressed and skips rather than re-running it. For a free-form chat turn that's reasonable (don't double-emit), but for a structured turn it means the turn never re-runs to produce the schema-shaped output._completeRecoveredSubmissioncarries no structured output (packages/think/src/think.ts:10024, and its call sites ~9290/9572/9805/9899/9918/9944/10156/10181). A recovered submission has no path to reconstruct/re-derive the structuredoutput, so even when reconciled it cannot satisfy thestep.prompt({ output })contract.Why it's deferred
A correct fix touches the recovery continuation path fairly broadly (distinguishing structured turns so they re-run to produce output, and threading the structured output through
_completeRecoveredSubmission), which is higher-risk than the surrounding e2e work. Filing as a follow-up rather than bundling it into the test-coverage change.Repro
packages/think/src/e2e-tests/workflow-recovery.test.ts— the "recovery" case kills a structured workflow turn mid-stream and currently observes a terminal-but-non-completeworkflow. Tightening that assertion to requirestatus === "complete"with the validated output would reproduce the gap.Acceptance
workflow-recovery.test.tsis tightened to assertcomplete+ correct output (removing theNOTE (deferred)).