Skip to content

Lift Lark identifier-ladder retry to ILarkOutboundDispatcher boundary #415

@eanzhao

Description

@eanzhao

Filed as the third architectural follow-up from the PR #412 long-form review. Issues #408 (typed OutboundTarget proto sub-message) and #414 (DRY TrySendWithFallbackAsync between SkillRunner and FeishuCardHumanInteractionPort) cover the other two; this one captures the third.

Problem

The 230002 bot not in chat → fallback retry logic in PR #412 lives in two actor/port-side call sites:

Both call sites today know:

  1. What identifier classes Lark accepts (chat_id / union_id / open_id).
  2. The ordering invariant (chat_id is most specific cross-app cross-tenant, union_id second, open_id last).
  3. Which Lark error code triggers a fallback (230002 bot not in chat).
  4. That 99992361 open_id cross app and 99992364 user id cross tenant are NOT retryable and should propagate with actionable hints.

That platform-specific knowledge should not live in actor/port code per CLAUDE.md "Actor 即业务实体" — actors should only know "send this content to this conversation". The identifier ladder is a transport-layer dispatch concern.

Proposed shape

Introduce an ILarkOutboundDispatcher boundary owned by ChannelRuntime infrastructure:

internal interface ILarkOutboundDispatcher
{
    Task<LarkSendResult> SendAsync(
        LarkOutboundEnvelope envelope,    // typed targets (primary + ordered fallbacks) + payload
        CancellationToken ct);
}

The dispatcher internally:

  • Walks the persisted target list (primary → fallback → fallback…) with the platform's known retry policy (currently: try next on 230002).
  • Surfaces unrecoverable Lark codes (99992361, 99992364) with the right last_error text built once.
  • Carries error / lark code / Nyx HTTP status back as a typed LarkSendResult so callers (SkillRunner, FeishuCardHumanInteractionPort) just observe success/failure without re-implementing the ladder.

Once the dispatcher exists:

  • SkillRunnerGAgent.SendOutputAsync and FeishuCardHumanInteractionPort.SendMessageAsync reduce to "build envelope + dispatch + map result" — no LarkBotErrorCodes.BotNotInChat strings, no TrySendWithFallbackAsync, no per-call-site fallback log lines that drift.
  • Adding a third identifier (open_id as final fallback) is one place, not two.
  • Adding new retryable Lark codes or a new platform (Telegram retryable codes when Telegram outbound lands) is one place, not two.
  • tools/ci/architecture_guards.sh can grep for LarkBotErrorCodes references in non-dispatcher code as a lint.

Dependencies

Out of scope

  • No behavior change. Dispatcher-side retry policy and error mapping must match today's call-site logic exactly. Pin with the existing SkillRunnerGAgent + FeishuCardHumanInteractionPort test suites.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions