Skip to content

fix(messaging): harden proactive broadcast parity#510

Open
vsumner wants to merge 4 commits intocodex/cron-runtime-observabilityfrom
codex/cron-adapter-parity
Open

fix(messaging): harden proactive broadcast parity#510
vsumner wants to merge 4 commits intocodex/cron-runtime-observabilityfrom
codex/cron-adapter-parity

Conversation

@vsumner
Copy link
Copy Markdown
Collaborator

@vsumner vsumner commented Mar 29, 2026

Why

broadcast_proactive() retries the entire OutboundResponse, not individual platform sends. That means chunking adapters have to distinguish between "nothing was delivered yet" and "part of this message already landed". If they do not, a transient failure on chunk 2 can cause cron to replay chunk 1 on retry and duplicate already-delivered output.

This PR keeps the scope adapter-focused. It does not redesign the messaging trait or retry owner; it makes the existing proactive delivery path truthful and non-duplicating for the reviewed adapters.

What Changed

  • add a shared messaging helper for chunked proactive sends so adapters can preserve normal retry classification before any send succeeds and upgrade failures to terminal after partial delivery
  • apply that rule to the chunked proactive broadcast paths in Slack, Discord, Mattermost, and Twitch so retries do not replay already-delivered chunks
  • fix the portal adapter rename drift after the rebase by removing stale webchat helper/test references and making portal fallback/error reporting consistent
  • validate Mattermost DM/direct target IDs before entering the classified send path so obviously invalid targets fail permanently instead of looking retryable
  • keep the existing proactive support/fallback model intact for adapters like portal, telegram, email, and signal; this slice hardens delivery semantics without changing schema, API, or UI behavior

Testing

  • cargo test messaging::traits --lib
  • cargo test messaging::manager --lib
  • cargo test messaging::portal --lib
  • cargo test messaging::slack --lib
  • cargo test messaging::discord --lib
  • cargo test messaging::mattermost --lib
  • cargo test messaging::twitch --lib
  • just preflight
  • just gate-pr

Notes

  • This PR is stacked on fix(cron): persist scheduler state and delivery outcomes #509 and should be reviewed after the cron runtime/observability base branch.
  • The important invariant is: once any proactive chunk has already been accepted by the platform, later failure must be terminal for that attempt because the retry unit is the whole response payload.
  • This PR intentionally does not redesign MessagingManager::broadcast_proactive() or the Messaging trait.

Follow-up hardening related to #502.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 29, 2026

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(messaging): harden proactive broadcast parity' accurately captures the main objective of making broadcast failure reporting more truthful and consistent across adapters.
Description check ✅ Passed The PR description clearly explains the motivation (preventing duplicate delivery on transient failures), what changed (adding shared helpers and applying to multiple adapters), and the scope (adapter-focused without redesigning traits).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cron-adapter-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vsumner vsumner requested a review from jamiepine March 29, 2026 17:58
@vsumner vsumner marked this pull request as ready for review March 29, 2026 17:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/messaging/mattermost.rs (1)

1008-1018: ⚠️ Potential issue | 🟠 Major

Validate Mattermost target IDs before they enter the classified-send path.

dm:{user_id} targets are sent straight into get_or_create_dm_channel(), and direct channel targets are only validated later inside create_post(). That means obviously malformed IDs can still surface through mark_classified_broadcast, so the cron retry loop may retry permanently bad targets as if they were send failures. Call Self::validate_id() on both target shapes here and map those failures with mark_permanent_broadcast.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messaging/mattermost.rs` around lines 1008 - 1018, Validate Mattermost
target IDs before resolving DMs or passing them down the classified-send path:
call Self::validate_id(user_id) and map any Err to
crate::messaging::traits::mark_permanent_broadcast before calling
get_or_create_dm_channel(user_id) in the dm:{user_id} branch, and likewise call
Self::validate_id(target) (mapping errors to mark_permanent_broadcast) for
direct channel targets instead of deferring validation to create_post(); keep
the rest of the logic (get_or_create_dm_channel, resolved_target usage) intact.
src/messaging/slack.rs (1)

1148-1160: ⚠️ Potential issue | 🟠 Major

Don't retry a partially delivered chunked Slack broadcast.

After the first chat_post_message succeeds, this call has already delivered part of the broadcast. If a later chunk gets classified as transient here, cron will retry from chunk 1 and duplicate the chunks Slack already accepted. After the first successful post, this path should return a non-retryable/partial-delivery error instead.

Possible fix
             OutboundResponse::Text(text) => {
+                let mut sent_any = false;
                 for chunk in split_message(&text, 12_000) {
                     let mut req = SlackApiChatPostMessageRequest::new(
                         channel_id.clone(),
                         markdown_content(chunk),
                     );
                     req = req.opt_thread_ts(thread_ts.clone());
-                    session
-                        .chat_post_message(&req)
-                        .await
-                        .context("failed to broadcast slack message")
-                        .map_err(crate::messaging::traits::mark_classified_broadcast)?;
+                    match session
+                        .chat_post_message(&req)
+                        .await
+                        .context("failed to broadcast slack message")
+                    {
+                        Ok(_) => sent_any = true,
+                        Err(error) => {
+                            return Err(if sent_any {
+                                crate::messaging::traits::mark_permanent_broadcast(error)
+                            } else {
+                                crate::messaging::traits::mark_classified_broadcast(error)
+                            });
+                        }
+                    }
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messaging/slack.rs` around lines 1148 - 1160, The loop in
OutboundResponse::Text uses split_message and calls session.chat_post_message
for each chunk; if an earlier chunk succeeds but a later chunk fails and is
classified transient via crate::messaging::traits::mark_classified_broadcast,
the whole broadcast will be retried from chunk 1 and duplicate delivered chunks.
Fix by tracking whether any chunk has already been successfully posted (e.g., a
boolean like sent_any_chunk) and, when session.chat_post_message returns an
error for a later chunk, convert transient-classified errors into a
non-retryable partial-delivery error (create or reuse a variant meaning "partial
delivery — do not retry") instead of letting mark_classified_broadcast cause a
full retry; update the mapping around session.chat_post_message (the
SlackApiChatPostMessageRequest + session.chat_post_message call) to check
sent_any_chunk and return the non-retryable/partial-delivery error when
appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/messaging/email.rs`:
- Around line 520-524: The predicate ensure_supported_broadcast_response
currently treats any RichMessage as supported even though the email sender only
uses response.text; update the check or the send path so RichMessage is only
accepted when it has a usable plaintext fallback: either (A) tighten
supports_email_broadcast_response/ensure_supported_broadcast_response to require
that a RichMessage contains a non-empty text/plain fallback (e.g.,
response.text.is_some_and(|t| !t.is_empty())), or (B) when sending in the
broadcast arm that handles RichMessage, derive a plaintext body from the rich
payload (extract card titles, bodies, choice labels, or poll question into a
single text fallback) and use that for the email body; change the predicate
and/or the broadcast send logic (references:
ensure_supported_broadcast_response, RichMessage, response,
supports_email_broadcast_response) so a card-only message cannot pass without
producing non-empty text.

In `@src/messaging/signal.rs`:
- Around line 953-955: The error currently includes the raw Signal target in the
permanent-failure message; change the call to
crate::messaging::traits::mark_permanent_broadcast inside the same branch to
avoid leaking the target by removing the `{target}` interpolation and instead
use a generic message (e.g. "invalid signal broadcast target format" or "invalid
signal broadcast target: redacted") or explicitly redact the value before
including it; update the anyhow::anyhow! invocation accordingly so
mark_permanent_broadcast and any logs no longer contain the raw phone/UUID/group
identifier.

In `@src/messaging/telegram.rs`:
- Around line 547-556: The predicate passed to
crate::messaging::traits::ensure_supported_broadcast_response currently accepts
every OutboundResponse::RichMessage even though the Telegram send path (the
message-sending logic that only emits text and an optional poll) drops cards and
interactive_elements; update the check in ensure_supported_broadcast_response to
either (a) only accept RichMessage variants that contain a plain text fallback
(e.g., require a non-empty body/text field) and no unsupported fields, or (b)
explicitly restrict acceptance to the subset Telegram can degrade to (e.g.,
RichMessage with text +/- poll) and reject RichMessage values containing cards
or interactive_elements unless you synthesize/derive a text fallback first;
locate and update the predicate around ensure_supported_broadcast_response and
the code that builds the outgoing Telegram payload so both agree on the
supported subset (OutboundResponse::RichMessage with text-only or
text-plus-poll).

In `@src/messaging/twitch.rs`:
- Around line 473-490: The Text and RichMessage branches call client.say() in a
loop over split_message(...) and currently map all failures through
crate::messaging::traits::mark_classified_broadcast which treats them as
retryable broadcast errors; change the logic to track whether any prior chunk
succeeded (e.g., a boolean flag) and if a later chunk fails after at least one
success, return a non-retryable/partial-delivery error instead of a full
retryable broadcast error — update the error mapping on the client.say() calls
in the OutboundResponse::Text and OutboundResponse::RichMessage handling (the
loops that iterate over split_message(&text, MAX_MESSAGE_LENGTH)) so that: if no
chunks have succeeded, keep the original error mapping, but if some chunks
already succeeded map/convert the error to a non-retryable/partial-delivery
classification (use your project’s existing partial/non-retryable error helper)
and return immediately to avoid re-sending earlier chunks.

In `@src/messaging/webchat.rs`:
- Line 77: The code uses extract_webchat_broadcast_text(response) which returns
RichMessage.text and ignores card-derived fallback, letting card-only broadcasts
be treated as success with empty payload; update the places that build
ApiEvent::OutboundMessage (where text is sourced from
extract_webchat_broadcast_text) to derive a plaintext fallback using
OutboundResponse::text_from_cards(&response) when RichMessage.text is empty (or
otherwise reject/return an error for card-only RichMessage), and similarly
adjust other call sites that construct ApiEvent::OutboundMessage so they use
OutboundResponse::text_from_cards or validate RichMessage card-only payloads
before reporting delivery; reference functions/types:
extract_webchat_broadcast_text, OutboundResponse::text_from_cards,
ApiEvent::OutboundMessage, RichMessage.

---

Outside diff comments:
In `@src/messaging/mattermost.rs`:
- Around line 1008-1018: Validate Mattermost target IDs before resolving DMs or
passing them down the classified-send path: call Self::validate_id(user_id) and
map any Err to crate::messaging::traits::mark_permanent_broadcast before calling
get_or_create_dm_channel(user_id) in the dm:{user_id} branch, and likewise call
Self::validate_id(target) (mapping errors to mark_permanent_broadcast) for
direct channel targets instead of deferring validation to create_post(); keep
the rest of the logic (get_or_create_dm_channel, resolved_target usage) intact.

In `@src/messaging/slack.rs`:
- Around line 1148-1160: The loop in OutboundResponse::Text uses split_message
and calls session.chat_post_message for each chunk; if an earlier chunk succeeds
but a later chunk fails and is classified transient via
crate::messaging::traits::mark_classified_broadcast, the whole broadcast will be
retried from chunk 1 and duplicate delivered chunks. Fix by tracking whether any
chunk has already been successfully posted (e.g., a boolean like sent_any_chunk)
and, when session.chat_post_message returns an error for a later chunk, convert
transient-classified errors into a non-retryable partial-delivery error (create
or reuse a variant meaning "partial delivery — do not retry") instead of letting
mark_classified_broadcast cause a full retry; update the mapping around
session.chat_post_message (the SlackApiChatPostMessageRequest +
session.chat_post_message call) to check sent_any_chunk and return the
non-retryable/partial-delivery error when appropriate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 847a9f49-7a39-4b47-96f2-48f5645b8a9b

📥 Commits

Reviewing files that changed from the base of the PR and between c6b953b and 3cb0fda.

📒 Files selected for processing (8)
  • src/messaging/discord.rs
  • src/messaging/email.rs
  • src/messaging/mattermost.rs
  • src/messaging/signal.rs
  • src/messaging/slack.rs
  • src/messaging/telegram.rs
  • src/messaging/twitch.rs
  • src/messaging/webchat.rs

@vsumner vsumner force-pushed the codex/cron-adapter-parity branch from 249a250 to 9690040 Compare March 29, 2026 23:22
@vsumner
Copy link
Copy Markdown
Collaborator Author

vsumner commented Mar 30, 2026

Review Comment Resolution (Post-Rebase)

The following CodeRabbit/Tembo comments have been addressed in the rebased branch:

✅ Fixed

Comment File Resolution
Signal target leak signal.rs:955 No interpolation: "invalid signal broadcast target format"
Discord retryable discord.rs:418 Already uses mark_retryable_broadcast
Twitch partial delivery twitch.rs:478-487 sent_any tracking + classify_twitch_broadcast_error
Email RichMessage email.rs:696-698 Predicate checks rich_message_plaintext_fallback(...).is_some()
Telegram RichMessage telegram.rs:627-634 telegram_rich_message_text calls text_from_cards
Webchat card-only webchat.rs:178-187 rich_message_plaintext_fallback extracts card text

All 6 comments reference outdated code from before the rebase onto codex/cron-runtime-observability@fae2bf3. The issues were resolved in commits 00eafdf through fae2bf3 (race condition fixes, P2 findings, and formatting) that are now included in this branch.

No action required - comments can be resolved/dismissed.

@vsumner vsumner force-pushed the codex/cron-runtime-observability branch from fae2bf3 to 2c3e953 Compare March 31, 2026 21:22
@vsumner vsumner force-pushed the codex/cron-adapter-parity branch from 9690040 to e690e5e Compare March 31, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant