Recover pending commit intents on startup#1251
Conversation
|
Warning Review limit reached
More reviews will be available in 53 minutes and 31 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR adds infrastructure to recover and reschedule pending scheduled commit intents that were persisted before completing. The recovery path loads persisted commit status rows, reconstructs them into Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-accounts/src/scheduled_commits_processor.rs`:
- Around line 200-209: remove_intent_metas currently calls
intents_meta_map.lock().expect(...), which can panic on a poisoned mutex; change
it to handle PoisonError instead and avoid panicking. Replace the expect with a
match (or map_err/unwrap_or_else) on intents_meta_map.lock() to obtain the guard
either from Ok(guard) or from Err(poison) => poison.into_inner(), then proceed
to remove entries from the guard; ensure the function does not panic and logs or
quietly continues on poison to keep error-path callers safe (refer to function
remove_intent_metas and variable intents_meta_map).
- Around line 179-197: The current all-or-nothing cleanup after
committor.schedule_recovered_intent_bundles risks removing all metadata even if
only some bundles failed; either assert batch-or-nothing semantics or make
cleanup granular: change schedule_recovered_intent_bundles to return per-bundle
results (e.g., Result<Vec<BundleResult>, Error> or a struct containing a Vec of
succeeded bundle IDs), then in the caller inspect the returned successes and
call Self::remove_intent_metas(&intents_meta_map, &succeeded_intent_ids) to only
remove metadata for successfully scheduled bundles (update any call sites
accordingly); if you intend batch semantics, add an explicit comment and a test
asserting atomic behavior.
- Around line 156-171: The code currently calls
intents_meta_map.lock().expect(POISONED_MUTEX_MSG) which panics on a poisoned
mutex; replace this .expect() with proper error handling: match on
intents_meta_map.lock(), on Ok(mut intent_metas) proceed to insert
ScheduledBaseIntentMeta::new(intent) and collect undelegate pubkeys as before,
but on Err(poison_err) log the poisoning (including poison_err) and return an
appropriate error from the surrounding function or return early from startup
recovery (do not panic); use the existing symbols intent_bundles,
intents_meta_map, ScheduledBaseIntentMeta::new, pubkeys_being_undelegated and
POISONED_MUTEX_MSG to locate and update the block accordingly.
In `@magicblock-committor-service/src/persist/commit_persister.rs`:
- Around line 249-256: The method get_pending_commit_statuses currently calls
self.commits_db.lock().expect(...) which must be removed; instead handle a
poisoned mutex by mapping the lock() error into the existing CommitPersistResult
error type and returning it (e.g., translate the PoisonError into a
CommitPersistError variant such as MutexPoisoned or Internal), then call
.get_pending_commit_statuses() on the locked guard; update the code in
commit_persister.rs around the get_pending_commit_statuses function to replace
expect() with lock().map_err(|e| /* map to CommitPersistError */)? and propagate
that error so no unwrap/expect is used (reference symbols:
get_pending_commit_statuses and self.commits_db).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b3b15829-349c-400a-910c-96a13c30ea35
📒 Files selected for processing (5)
magicblock-accounts/src/scheduled_commits_processor.rsmagicblock-committor-service/src/committor_processor.rsmagicblock-committor-service/src/persist/commit_persister.rsmagicblock-committor-service/src/persist/db.rsmagicblock-committor-service/src/service.rs
52834e3 to
d006bda
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 445552a74c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df8d4d6c51
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
magicblock-committor-service/src/persist/commit_persister.rs (1)
251-258:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace panic-on-lock with error propagation in pending-status retrieval.
Line 256 uses
.expect(POISONED_MUTEX_MSG), which can panic the process on mutex poisoning during a production recovery flow. Return aCommitPersistResulterror instead of panicking.As per coding guidelines,
{magicblock-*,programs,storage-proto}/**: Treat any usage of.unwrap()or.expect()in production Rust code as a MAJOR issue.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@magicblock-committor-service/src/persist/commit_persister.rs` around lines 251 - 258, The get_pending_commit_statuses implementation currently calls self.commits_db.lock().expect(POISONED_MUTEX_MSG) which can panic on mutex poisoning; change this to handle the lock error and return a CommitPersistResult error instead of panicking. Replace the expect call with matching or map_err on the Mutex::lock() result, convert the PoisonError into an appropriate CommitPersistResult failure (e.g., construct/return a CommitPersistError with context like "mutex poisoned getting pending commit statuses"), then call .get_pending_commit_statuses() on the guarded value; keep references to commits_db.lock(), POISONED_MUTEX_MSG and the function get_pending_commit_statuses to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-chainlink/src/chainlink/mod.rs`:
- Around line 427-442: The code is silently truncating when using
pubkeys.iter().zip(remote_accounts) which can produce a shorter boolean Vec if
lengths differ; before the Ok(pubkeys.iter().zip(remote_accounts)...) block,
validate that pubkeys.len() == remote_accounts.len() and return an appropriate
Err (or an InvalidInput/LengthMismatch error) on mismatch; if lengths match,
proceed with the existing mapping that calls
remote_account.is_owned_by_delegation_program(),
self.accounts_bank.get_account(pubkey).is_some_and(|account| account.delegated()
|| account.owner().eq(&dlp_api::id())), and collect the boolean vector as
before.
In `@magicblock-committor-service/src/committor_processor.rs`:
- Around line 271-274: When reconstructing a recovered bundle from rows (in the
block that starts with let first = rows.first()? and creates
commit_finalize_accounts), validate that every row for the same message_id has
identical slot and ephemeral_blockhash before using
first.slot/first.ephemeral_blockhash to build the bundle; implement a check like
rows.iter().all(|r| r.slot == slot && r.ephemeral_blockhash == blockhash) and if
it returns false, log a warning/error referencing the message_id and skip that
group (do not populate commit_finalize_accounts or proceed with reconstruction),
applying the same validation for the subsequent code block around lines 277–288
that also assumes consistent metadata.
In `@magicblock-rpc-client/src/utils.rs`:
- Around line 209-212: The match currently treats all
MagicBlockRpcClientError::SimulatedTransactionError(_) as terminal which
prevents retrying recoverable simulation failures; update the error handling so
MagicBlockRpcClientError::SimulatedTransactionError(inner) is inspected and if
inner is a TransactionError::BlockhashNotFound (or equivalent variant) it is
classified as retryable (i.e., do not include it in the terminal arm), otherwise
keep non-retryable behavior for other simulated errors; specifically change the
match that groups MagicBlockRpcClientError::GetSlot, ::LookupTableDeserialize,
::SentTransactionError, ::SimulatedTransactionError(_) to instead pattern-match
::SimulatedTransactionError(inner) and branch on inner for BlockhashNotFound.
---
Duplicate comments:
In `@magicblock-committor-service/src/persist/commit_persister.rs`:
- Around line 251-258: The get_pending_commit_statuses implementation currently
calls self.commits_db.lock().expect(POISONED_MUTEX_MSG) which can panic on mutex
poisoning; change this to handle the lock error and return a CommitPersistResult
error instead of panicking. Replace the expect call with matching or map_err on
the Mutex::lock() result, convert the PoisonError into an appropriate
CommitPersistResult failure (e.g., construct/return a CommitPersistError with
context like "mutex poisoned getting pending commit statuses"), then call
.get_pending_commit_statuses() on the guarded value; keep references to
commits_db.lock(), POISONED_MUTEX_MSG and the function
get_pending_commit_statuses to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 97707491-1d08-4fc7-bae3-cd0e76ff1465
📒 Files selected for processing (11)
magicblock-accounts/src/scheduled_commits_processor.rsmagicblock-api/src/magic_validator.rsmagicblock-chainlink/src/chainlink/fetch_cloner/mod.rsmagicblock-chainlink/src/chainlink/mod.rsmagicblock-committor-service/src/committor_processor.rsmagicblock-committor-service/src/intent_executor/intent_execution_client.rsmagicblock-committor-service/src/persist/commit_persister.rsmagicblock-committor-service/src/persist/db.rsmagicblock-committor-service/src/service.rsmagicblock-rpc-client/src/lib.rsmagicblock-rpc-client/src/utils.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 076a83884e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1f20bcc to
f73b081
Compare
- Batch recovery scheduling instead of one actor round-trip per intent - Drop redundant status filter (SQL WHERE already filters by Pending) - Rename schedule_recovered_intent_bundle -> _bundles (Vec param)
- Move pending-intent recovery spawn out of ScheduledCommitsProcessorImpl::new into spawn_pending_intents_recovery, invoked from MagicValidator::start after ledger replay + reset_accounts_bank so the local bank reflects post-replay delegated state. - Validate that grouped commit_status rows agree on slot/ephemeral_blockhash before reconstructing a recovered bundle; skip the message_id on mismatch. - Reject silent length mismatches in accounts_delegated_on_base_and_er by returning ChainlinkError::UnexpectedAccountCount when fetch returns a different count than requested. - Remove the universal pre-send simulate_transaction call along with its client method, error variants, and util mappings. The on-chain dlp ownership gate in accounts_delegated_on_base_and_er already prevents committing stale account snapshots.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f73b0817fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f73b081 to
8bbee67
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8bbee67008
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| let intent_bundles = | ||
| self.recoverable_intent_bundles(intent_bundles).await; | ||
| if intent_bundles.is_empty() { |
There was a problem hiding this comment.
All the code below is replacable by existing functionality
self.transaction_scheduler.accept_scheduled_base_intent(intent_bundles);
self.process().await;
There was a problem hiding this comment.
The process historically doesn't have the best api, but to avoid insertion there could be process_inner()
Thust the function could would be
let intents = self.load().await??;
self.process_inner(intents).await?;There was a problem hiding this comment.
The normal path re-persists intents via start_base_intents before scheduling them. Recovered intents were loaded from existing pending DB rows, and the current code deliberately calls magicblock-accounts/src/scheduled_commits_processor.rs:191, whose committor path schedules without re-persisting.
taco-paco
left a comment
There was a problem hiding this comment.
Thanks for taking this on! There's couple of things that would be good to do: reduce code repetition, simplify error dispatch and logging via propagation to lowest point and via custom Error variants if needed.
There are also potential race conditions between detached recover_pending_intents and ongoing requests, but we can go ahead without addressing this for now.
| } | ||
| let intent_bundles = | ||
| self.recoverable_intent_bundles(intent_bundles).await; | ||
| if intent_bundles.is_empty() { |
There was a problem hiding this comment.
The process historically doesn't have the best api, but to avoid insertion there could be process_inner()
Thust the function could would be
let intents = self.load().await??;
self.process_inner(intents).await?;
Summary
commit_statusrows and schedule them without inserting duplicate persistence rows.ScheduledCommitSentfallback signals.NOTE:
Why
If a validator restarts after a base-chain commit/finalize attempt remains pending in persistence, the node currently does not resubmit that persisted work. This leaves already-scheduled account commits dependent on manual intervention.
Notes
commit_statusrows.start_base_intentsto avoid duplicate row insertion.Summary by CodeRabbit
New Features
Bug Fixes