Background
PR #374 (feat/2026-04-24_lark-streaming-reply) split the streaming-failure handling on ConversationGAgent into two flags so we no longer reuse a dead NyxID /reply token after the first chunk lands:
Disabled — initial send failed before the reply token was consumed; safe to fall back to a single-shot /reply via RunLlmReplyAsync.
SuppressInterim — first chunk consumed the token; only /reply/update is valid; no fallback to /reply allowed.
The fix is correct, but the runtime state is now an ad-hoc combination of two booleans plus a derived ReplyTokenConsumed (boolean over PlatformMessageId). This will keep accreting flags as more failure modes are added (rate-limit, message recall, edit-unsupported, etc.) and the legality of state transitions only lives in the heads of reviewers.
In parallel, "should I skip this callback because the message is already unavailable?" logic is scattered across ConversationGAgent.HandleLlmReplyStreamChunkAsync, TryStreamedLlmReplyFinalizeAsync, and the static-fallback path. New handlers (e.g. tool-call hooks, reasoning hooks if/when added) will have to remember to mirror the same checks.
The OpenClaw Lark plugin (https://github.com/ColinLu50/openclaw-lark-stream) hits the same shape of problem and resolves it with two patterns we should adopt:
- An explicit
CardPhase state machine with a PHASE_TRANSITIONS map that rejects illegal transitions and records a terminalReason for observability.
- A single
UnavailableGuard that owns the "should this callback short-circuit" decision, so every entrypoint defers to one method instead of repeating the same checks.
Scope
A. Phase state machine for the per-turn streaming runtime state
Replace the Disabled + SuppressInterim + derived ReplyTokenConsumed shape on NyxRelayStreamingState with an explicit phase enum, e.g.
Idle // no chunk attempted yet
PlaceholderSent // first send landed, token consumed
Streaming // interim edits flowing
SuppressingInterim // post-send interim edit failed; final edit still allowed
DisabledPreSend // initial send failed before token consumed; /reply fallback allowed
TerminalSucceeded
TerminalPartial // last flushed text was persisted as the user-visible terminal state
Constraints:
- Must remain in-memory on the actor (per CLAUDE.md "运行态不持久化"); the dictionary already lives outside
State.
- Define a
PhaseTransitions table and reject illegal transitions with a log line at warn level (do NOT throw — actor turns must keep making progress).
- Capture a
TerminalReason on entering any terminal phase for diagnostics.
- All branches that today read
state.Disabled || state.SuppressInterim should be expressed through phase-level helpers (AllowsReplyFallback, AllowsInterimEdit, AllowsFinalEdit).
B. Centralized unavailable-message guard
Introduce a single guard helper on ConversationGAgent (or a small dependency) that owns:
- "Is the upstream message recalled / deleted / 230099-class error?"
- "Has this turn already been terminated for unavailability?"
- "Should this callback
source short-circuit?"
Every public handler that touches the streaming path (HandleLlmReplyStreamChunkAsync, TryStreamedLlmReplyFinalizeAsync, future reasoning/tool hooks) should defer to this guard at the top of the method instead of repeating ad-hoc checks. New handlers added in the future then only have to call if (ShouldSkipForUnavailable(\"<source>\")) return;.
Out of scope
- Switching the outbound path from
/reply + /reply/update (edit-message) to Lark CardKit 2.0 streaming cards. That is a separate Lark-only UX track and will be filed separately if/when reasoning/tool visualization is on the roadmap.
- The
TurnStreamingReplySink pending-flush timer / reflush-on-conflict work — handled in a separate PR (sink-only change, Sink doesn't touch the actor's failure state).
References
Background
PR #374 (
feat/2026-04-24_lark-streaming-reply) split the streaming-failure handling onConversationGAgentinto two flags so we no longer reuse a dead NyxID/replytoken after the first chunk lands:Disabled— initial send failed before the reply token was consumed; safe to fall back to a single-shot/replyviaRunLlmReplyAsync.SuppressInterim— first chunk consumed the token; only/reply/updateis valid; no fallback to/replyallowed.The fix is correct, but the runtime state is now an ad-hoc combination of two booleans plus a derived
ReplyTokenConsumed(boolean overPlatformMessageId). This will keep accreting flags as more failure modes are added (rate-limit, message recall, edit-unsupported, etc.) and the legality of state transitions only lives in the heads of reviewers.In parallel, "should I skip this callback because the message is already unavailable?" logic is scattered across
ConversationGAgent.HandleLlmReplyStreamChunkAsync,TryStreamedLlmReplyFinalizeAsync, and the static-fallback path. New handlers (e.g. tool-call hooks, reasoning hooks if/when added) will have to remember to mirror the same checks.The OpenClaw Lark plugin (https://github.com/ColinLu50/openclaw-lark-stream) hits the same shape of problem and resolves it with two patterns we should adopt:
CardPhasestate machine with aPHASE_TRANSITIONSmap that rejects illegal transitions and records aterminalReasonfor observability.UnavailableGuardthat owns the "should this callback short-circuit" decision, so every entrypoint defers to one method instead of repeating the same checks.Scope
A. Phase state machine for the per-turn streaming runtime state
Replace the
Disabled+SuppressInterim+ derivedReplyTokenConsumedshape onNyxRelayStreamingStatewith an explicit phase enum, e.g.Constraints:
State.PhaseTransitionstable and reject illegal transitions with a log line at warn level (do NOT throw — actor turns must keep making progress).TerminalReasonon entering any terminal phase for diagnostics.state.Disabled || state.SuppressInterimshould be expressed through phase-level helpers (AllowsReplyFallback,AllowsInterimEdit,AllowsFinalEdit).B. Centralized unavailable-message guard
Introduce a single guard helper on
ConversationGAgent(or a small dependency) that owns:sourceshort-circuit?"Every public handler that touches the streaming path (
HandleLlmReplyStreamChunkAsync,TryStreamedLlmReplyFinalizeAsync, future reasoning/tool hooks) should defer to this guard at the top of the method instead of repeating ad-hoc checks. New handlers added in the future then only have to callif (ShouldSkipForUnavailable(\"<source>\")) return;.Out of scope
/reply+/reply/update(edit-message) to Lark CardKit 2.0 streaming cards. That is a separate Lark-only UX track and will be filed separately if/when reasoning/tool visualization is on the roadmap.TurnStreamingReplySinkpending-flush timer / reflush-on-conflict work — handled in a separate PR (sink-only change, Sink doesn't touch the actor's failure state).References
0d3a91a0("Fix streaming failure degradation and first-visible latency")src/card/streaming-card-controller.ts(PHASE_TRANSITIONS),src/card/unavailable-guard.ts