fix: wait for pubsub listeners before reconnect#1253
Conversation
* master: fix: various correctness related enhancements (#1248)
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR updates Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs`:
- Around line 224-233: The calls that use .expect(...) when locking
subscriptions and program_subs (the code that initializes account_subs and
program_subs by calling subscriptions.lock() and program_subs.lock()) must not
panic on PoisonError; replace those .expect(...) usages with recoverable
handling that maps the PoisonError into a RemoteAccountProviderError and then
propagate or send that error from the actor instead of panicking. Concretely,
change subscriptions.lock().expect(...) and program_subs.lock().expect(...) to
subscriptions.lock().map_err(|e|
RemoteAccountProviderError::MutexPoisoned(format!("{}", e)))? (or similar) and
return or send that RemoteAccountProviderError to the caller/actor mailbox; make
the same change for the other occurrences around the 418-423 region so all mutex
poison paths propagate RemoteAccountProviderError rather than calling expect.
- Around line 235-253: The current loops serially call and await
Self::cancel_and_wait_for_stream_drop for entries in account_subs and
program_subs, causing N×timeout reconnect latency; change the logic to first
invoke cancellation for all subs without awaiting (collecting the returned
futures) and then await them concurrently (e.g. via futures::future::join_all or
FuturesUnordered) while still capturing the first error into first_error;
operate on the same identifiers (account_subs, program_subs, client_id,
cancel_and_wait_for_stream_drop, first_error) so cancellations run in parallel
and overall wait is bounded by a single timeout rather than multiplied by N.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e6267bb9-b27d-46e7-acf4-559ddef8ddc6
📒 Files selected for processing (3)
magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rsmagicblock-chainlink/src/remote_account_provider/pubsub_common.rsmagicblock-chainlink/src/remote_account_provider/pubsub_connection_pool.rs
* master: fix: disable Chainlink for replicas (#1238)
Summary
Fix pubsub reconnect lifetime safety by ensuring reconnect waits for subscription listener tasks to stop and drop their streams before pooled pubsub clients are replaced.
Details
magicblock-chainlink
This changes the reconnect path from best-effort cancellation to a two-phase drain:
The pubsub pool reconnect docs now call out the required precondition that old listener streams must be finished before pooled clients are dropped.
Additional tests cover successful account/program listener drain, completion timeout behavior, and the fast explicit-unsubscribe path preserving the map entry for reconnect drain.
Summary by CodeRabbit
Bug Fixes
Tests
Documentation