Skip to content

enhance(daily): richer report content + progressive delivery (streaming-edit or batched) #423

@eanzhao

Description

@eanzhao

Summary

Day-one enhancement plan for /daily. The current report is functional after #421 closed the GitHub-403 root cause, but the content is thin and the delivery is one-shot — the user gets a single Lark message after the agent finishes, with no progress indication and no per-source breakdown. Two related areas to fix:

  1. Content depth — the skill prompt at agents/Aevatar.GAgents.ChannelRuntime/AgentBuilderTemplates.cs:48-64 only suggests three GitHub queries (commits authored, issues authored, issues commented) and asks for "3-6 concise bullet points + one blocker line". This produces something like a TL;DR, not a daily.
  2. Delivery UXSkillRunnerGAgent.SendOutputAsync (agents/Aevatar.GAgents.ChannelRuntime/SkillRunnerGAgent.cs:254-300) sends the entire LLM output in one POST /open-apis/im/v1/messages call. There is no streaming-edit equivalent of normal chat (ChannelLlmReplyInboxRuntimeTurnStreamingReplySink/channel-relay/reply/update). The user sees nothing for ~30s, then a wall of text.

Why current streaming doesn't transfer to SkillRunner

TurnStreamingReplySink works against the reply token baked into the inbound webhook payload — that token has a ~14-minute TTL and is bound to one specific user-initiated turn. Scheduled SkillRunner runs (the daily 9am cron, the manual /run-agent, retries) don't have a reply token; they're ambient. Trying to reuse /channel-relay/reply/update would land as reply_token_missing_or_expired.

The closest equivalent for SkillRunner is edit-own-message via Lark's PATCH /open-apis/im/v1/messages/{message_id} (text and card update endpoints already exist — the channel-runtime adapter uses them today for chat streaming). That gives SkillRunner the same "watch the message grow" UX without depending on a reply-token grant.

Proposed scope

A. Content depth (skill prompt + suggested data sources)

Rewrite TryBuildDailyReportSpec so the daily covers more surface area, with explicit structured sections and a length budget per section. Treat the new prompt as a specification of what to fetch + how to summarize, not a freeform creative brief.

Suggested sections (each with a hard ≤N-line budget, omitted entirely if empty rather than padded):

  • Shipped — PRs merged in last 24h (title + repo + #PR), commits to default branch
  • In flight — open PRs authored, with stale-flag (>24h since last activity)
  • Reviews — PRs reviewed (approve/request-changes/comment counts), review comments left
  • Issues — issues opened, closed, commented on
  • CI status — recent failing builds on default branch (if any)
  • Trend vs yesterday — comparing today's counts against the prior 24h window
  • Blockers — auto-detected: PRs >24h waiting for review, CI red >2h, GitHub blocked/needs-info labels

GitHub queries to suggest (replacing the current 3):

GET /search/issues?q=author:{u}+is:pr+is:merged+merged:>={iso}      # shipped PRs
GET /search/issues?q=author:{u}+is:pr+is:open                       # in flight
GET /search/issues?q=reviewed-by:{u}+updated:>={iso}                # reviews
GET /search/issues?q=author:{u}+is:issue+created:>={iso}            # issues opened
GET /search/issues?q=author:{u}+is:issue+is:closed+closed:>={iso}   # issues closed
GET /repos/{owner}/{repo}/actions/runs?branch=main&per_page=10      # CI on each tracked repo

When the user provides repositories=…, the prompt must instruct the LLM to make these calls once per repo rather than collapsing into a global search — the global /search/* endpoints don't filter to a repo allowlist cleanly.

Future scope (out for the first cut, but the prompt should leave room for):

  • Calendar provider via NyxID (api-google-calendar?) — meetings attended/upcoming
  • Notion / Linear / Jira if the user has them connected — pages edited, tickets moved
  • Slack/Lark message highlights via NyxID-bridge

B. Progressive delivery

Two real options, both implementable in this codebase:

Option 1 — batched (cheap, ships fast)
SkillRunner sends one Lark message per section as it produces them. Implementation: change SendOutputAsync to consume multiple outputs from the LLM turn (LLM emits a structured envelope: {section_id, header, body} per tool_call boundary, or use stop-sequence sectioning). For each section, send a fresh POST /im/v1/messages.

Pros: simplest. Pros: each section is an atomic artifact in the chat history.
Cons: chat clutter; user sees N notifications.

Option 2 — streaming-edit (richer UX, more work)
SkillRunner sends an initial placeholder Lark message and progressively edits it via PATCH /open-apis/im/v1/messages/{message_id}. Reuses the same TurnStreamingReplySink pattern but bound to the freshly-sent message_id instead of a reply token.

Concrete steps:

  1. New SkillRunnerStreamingReplySink in agents/Aevatar.GAgents.ChannelRuntime/ modeled after TurnStreamingReplySink but driven by NyxIdApiClient.UpdateChannelRelayTextReplyAsync-equivalent calls against s/api-lark-bot/open-apis/im/v1/messages/{id} (existing edit endpoint, already wrapped by ChannelConversationTurnRunner for normal chat — confirm it works without reply token).
  2. SkillRunnerGAgent.HandleTriggerAsync plumbs the sink through to LLMService so the OnDelta callbacks land on edit-the-message instead of buffer-and-send-once.
  3. Throttle edits the same way TurnStreamingReplySink does (Lark has its own rate limit on edits, default 5 edits/sec; the existing throttle is conservative enough).
  4. FinalizeAsync does one final edit with the complete text — same shape as the existing sink.

Pros: matches normal chat UX (one message that grows). Pros: no chat clutter.
Cons: more code; need to verify Lark message-edit doesn't have a "max edits" limit per message (rate limit yes, total count probably unlimited for text but worth confirming for cards).

Recommendation: ship Option 2. The infrastructure for edit-message is already in NyxIdApiClient (used by normal chat streaming), and the UX is what users actually expect when they trigger a multi-source report. Option 1 is a fallback if something blocks Option 2 mid-implementation.

C. Cross-cutting / safety net

  • Failure-notification path is currently broken under cross-tenant Lark setups (SkillRunnerGAgent:407 TrySendFailureAsync goes through the same s/api-lark-bot proxy that just rejected with 99992364, so the user never sees the failure either). Either reuse the inbound-webhook reply token for failure notification when it's still in TTL, or store a recent-channel-bot fallback.
  • Content boundary — the LLM should not invent activity when sources are empty. Current prompt says "say so plainly", but with 7+ sections it's tempting for the model to pad. Add a per-section "if zero results, omit the section entirely; if everything is empty, send 'No measurable activity in the last 24h.'" instruction.
  • Length cap — Lark text messages have a body size limit (around 30KB). With richer content + multi-repo + multi-source, we can blow past it. Implement chunked delivery for the text path (split on section boundaries, send N messages) before any of the above ships, or route through cards (which have their own block-count limit but no body limit).

Acceptance

  • Daily prompt rewritten with structured sections, repo-aware query suggestions, and "omit empty sections" guidance
  • At least one of Option 1 or Option 2 lands; Option 2 preferred
  • Failure-notification path no longer dies silently when outbound proxy 4xx's
  • A new test under AgentBuilderToolTests (or a new file) pins the structured prompt's "omit empty section" instruction so future copy edits don't regress it
  • Smoke test on a real GitHub user where commits authored is empty but PRs reviewed is non-empty — the report skips the Shipped section and renders Reviews

Out of scope for this issue

  • Calendar / Notion / Linear / Jira integrations (need NyxID provider work upstream)
  • Multi-language daily reports
  • Per-user customization (which sections to include, length preferences) — would belong on the agent config, not the template

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions