Lark streaming reply: explicit phase state machine + centralized unavailable guard

## Background

PR #374 (`feat/2026-04-24_lark-streaming-reply`) split the streaming-failure handling on `ConversationGAgent` into two flags so we no longer reuse a dead NyxID `/reply` token after the first chunk lands:

- `Disabled` — initial send failed before the reply token was consumed; safe to fall back to a single-shot `/reply` via `RunLlmReplyAsync`.
- `SuppressInterim` — first chunk consumed the token; only `/reply/update` is valid; no fallback to `/reply` allowed.

The fix is correct, but the runtime state is now an ad-hoc combination of two booleans plus a derived `ReplyTokenConsumed` (boolean over `PlatformMessageId`). This will keep accreting flags as more failure modes are added (rate-limit, message recall, edit-unsupported, etc.) and the legality of state transitions only lives in the heads of reviewers.

In parallel, "should I skip this callback because the message is already unavailable?" logic is scattered across `ConversationGAgent.HandleLlmReplyStreamChunkAsync`, `TryStreamedLlmReplyFinalizeAsync`, and the static-fallback path. New handlers (e.g. tool-call hooks, reasoning hooks if/when added) will have to remember to mirror the same checks.

The OpenClaw Lark plugin (https://github.com/ColinLu50/openclaw-lark-stream) hits the same shape of problem and resolves it with two patterns we should adopt:

1. An explicit `CardPhase` state machine with a `PHASE_TRANSITIONS` map that rejects illegal transitions and records a `terminalReason` for observability.
2. A single `UnavailableGuard` that owns the "should this callback short-circuit" decision, so every entrypoint defers to one method instead of repeating the same checks.

## Scope

### A. Phase state machine for the per-turn streaming runtime state

Replace the `Disabled` + `SuppressInterim` + derived `ReplyTokenConsumed` shape on `NyxRelayStreamingState` with an explicit phase enum, e.g.

```
Idle               // no chunk attempted yet
PlaceholderSent    // first send landed, token consumed
Streaming          // interim edits flowing
SuppressingInterim // post-send interim edit failed; final edit still allowed
DisabledPreSend    // initial send failed before token consumed; /reply fallback allowed
TerminalSucceeded
TerminalPartial    // last flushed text was persisted as the user-visible terminal state
```

Constraints:
- Must remain in-memory on the actor (per CLAUDE.md "运行态不持久化"); the dictionary already lives outside `State`.
- Define a `PhaseTransitions` table and reject illegal transitions with a log line at warn level (do NOT throw — actor turns must keep making progress).
- Capture a `TerminalReason` on entering any terminal phase for diagnostics.
- All branches that today read `state.Disabled || state.SuppressInterim` should be expressed through phase-level helpers (`AllowsReplyFallback`, `AllowsInterimEdit`, `AllowsFinalEdit`).

### B. Centralized unavailable-message guard

Introduce a single guard helper on `ConversationGAgent` (or a small dependency) that owns:
- "Is the upstream message recalled / deleted / 230099-class error?"
- "Has this turn already been terminated for unavailability?"
- "Should this callback `source` short-circuit?"

Every public handler that touches the streaming path (`HandleLlmReplyStreamChunkAsync`, `TryStreamedLlmReplyFinalizeAsync`, future reasoning/tool hooks) should defer to this guard at the top of the method instead of repeating ad-hoc checks. New handlers added in the future then only have to call `if (ShouldSkipForUnavailable(\"<source>\")) return;`.

## Out of scope

- Switching the outbound path from `/reply` + `/reply/update` (edit-message) to Lark CardKit 2.0 streaming cards. That is a separate Lark-only UX track and will be filed separately if/when reasoning/tool visualization is on the roadmap.
- The `TurnStreamingReplySink` pending-flush timer / reflush-on-conflict work — handled in a separate PR (sink-only change, Sink doesn't touch the actor's failure state).

## References

- PR #374 commit `0d3a91a0` ("Fix streaming failure degradation and first-visible latency")
- OpenClaw plugin: https://github.com/ColinLu50/openclaw-lark-stream — `src/card/streaming-card-controller.ts` (PHASE_TRANSITIONS), `src/card/unavailable-guard.ts`
- CLAUDE.md sections: 单线程事实源, 中间层状态约束（运行态在 actor 内，不持久化）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lark streaming reply: explicit phase state machine + centralized unavailable guard #405

Background

Scope

A. Phase state machine for the per-turn streaming runtime state

B. Centralized unavailable-message guard

Out of scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lark streaming reply: explicit phase state machine + centralized unavailable guard #405

Description

Background

Scope

A. Phase state machine for the per-turn streaming runtime state

B. Centralized unavailable-message guard

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions