Post process settlements in parallel #3853

MartinquaXD · 2025-10-31T12:35:30Z

Description

Currently we post-process settlements (i.e. associate the tx with a solution proposed by a solver) serially. IIRC it was done that way because it was simpler and that step simply ran in a background task at the time.
Since then this logic has been moved onto the hot path so all the time we spend there delays the creation of the next auction.
Additionally with the introduction of combinatorial auctions it's now possible and relatively common to have multiple settlement transactions in the same block. Whenever we have multiple settlements in one block this PR should result in a significant performance uplift because we process multiple concurrently.

Changes

replaced loop to post-process auctions serially with logic that first fetches ALL unprocessed settlements and then works on up to 10 of them concurrently. This is fine for the post processing logic (as opposed to the raw event indexing) because it's okay for post-processing to happen out of order. So if the we successfully post-process settlement n+1 but not settlement n (e.g. due to a crash) the DB query would still just return all unprocessed events instead of all NEW unprocessed events.

I also adjusted how retrying works in this code. Instead of returning some result enum (ok, invalid, nothing_to_do) to make the serial loop retry something we now have a function retry_with_sleep that simply retires a passed in future n times and returns an Option to indicate whether it was successful or not.
This is used in 2 places:

fetching all outstanding settlements
post-processing individual outstanding settlement

With that the new logic should be as robust as the old one while being IMO easier to reason about.

How to test

Not sure how to test the improvement specifically. Since this is a performance optimization and I don't really want to test the internals of the implementation having a new test that makes sure that a big number of settlements can be post-processed would be enough to test correctness and for the performance aspect we'd have to deploy this on the cluster.

Regarding the performance I temporarily deployed it to staging and it produced the expected effect of reducing the spikiness that comes from multiple settlements needing to be post-processed in the same block.

jmg-duarte · 2025-11-04T11:58:14Z

crates/autopilot/src/domain/settlement/observer.rs

+                    const TEMP_ERROR_BACK_OFF: Duration = Duration::from_millis(100);
+                    tokio::time::sleep(TEMP_ERROR_BACK_OFF).await;


Q: should it have some form of jitter to avoid retrying a bunch of things at the same time?

jmg-duarte · 2025-11-04T12:01:26Z

crates/autopilot/src/domain/settlement/observer.rs

+                match Self::retry_with_sleep(|| self.post_process_settlement(settlement)).await {
+                    Some(_) =>  tracing::debug!(tx = ?settlement.transaction, "successfully post-processed settlement"),
+                    None => tracing::warn!(tx = ?settlement.transaction, "gave up on post-processing settlement"),
+                }
+            })
+            .await;
+    }

+    async fn post_process_settlement(&self, settlement: eth::SettlementEvent) -> Result<()> {
        let settlement_data = self
-            .fetch_auction_data_for_transaction(event.transaction)
+            .fetch_auction_data_for_transaction(settlement.transaction)
            .await?;
        self.persistence
-            .save_settlement(event, settlement_data.as_ref())
+            .save_settlement(settlement, settlement_data.as_ref())
            .await
-            .context("failed to update settlement")?;
-
-        match settlement_data {
-            None => Ok(IndexSuccess::SkippedInvalidTransaction),
-            Some(_) => Ok(IndexSuccess::IndexedSettlement),
-        }
+            .context("failed to update settlement")


Given that we're running multiple ones in parallel, won't this apply extra write pressure on the DB?

Should we rather aggregate concurrent reads and write a single batch?

squadgazzz

The change makes sense to me, and I also don't see any obvious pitfalls.

squadgazzz · 2025-11-05T12:09:50Z

crates/autopilot/src/domain/settlement/observer.rs

+        .unwrap_or_default();

-            // everything worked fine -> reset our attempts for the next settlement
-            attempts = 0;
+        if settlements.is_empty() {
+            tracing::debug!("no unprocessed settlements found");


When retry_with_sleep returns None, that means it gave up on retrying with errors. But the log won't reflect it. Should we clarify that there is a problem with the fetching logic?

m-sz · 2025-11-06T13:46:12Z

crates/autopilot/src/domain/settlement/observer.rs

        }
    }
+
+    async fn retry_with_sleep<F, OK, ERR>(future: impl Fn() -> F) -> Option<OK>


Maybe it would help in debugging if instead we would collect the encountered errors and return Result<OK, Vec<ERR>>, then log it on call site? This would allow us to understand the error.

Alternatively one could assume the passed in future does its own logging.

github-actions · 2025-11-14T00:09:43Z

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

MartinquaXD added 2 commits November 3, 2025 08:56

Post process settlements in parallel

b44d852

inline function to fetch outstanding settlments

d1df7a8

MartinquaXD force-pushed the post-process-settlements-in-parallel branch from 19e9c5a to d1df7a8 Compare November 3, 2025 08:56

MartinquaXD added 2 commits November 3, 2025 11:00

Merge branch 'main' into post-process-settlements-in-parallel

f51b8c1

Merge branch 'main' into post-process-settlements-in-parallel

ee0fbf6

MartinquaXD marked this pull request as ready for review November 3, 2025 21:05

MartinquaXD requested a review from a team as a code owner November 3, 2025 21:05

jmg-duarte reviewed Nov 4, 2025

View reviewed changes

squadgazzz approved these changes Nov 5, 2025

View reviewed changes

m-sz reviewed Nov 6, 2025

View reviewed changes

github-actions bot added the stale label Nov 14, 2025

squadgazzz removed the stale label Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Post process settlements in parallel #3853

Post process settlements in parallel #3853

Uh oh!

MartinquaXD commented Oct 31, 2025 •

edited

Loading

Uh oh!

jmg-duarte Nov 4, 2025

Uh oh!

jmg-duarte Nov 4, 2025

Uh oh!

squadgazzz left a comment

Uh oh!

squadgazzz Nov 5, 2025

Uh oh!

m-sz Nov 6, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		const TEMP_ERROR_BACK_OFF: Duration = Duration::from_millis(100);
		tokio::time::sleep(TEMP_ERROR_BACK_OFF).await;

Post process settlements in parallel #3853

Are you sure you want to change the base?

Post process settlements in parallel #3853

Uh oh!

Conversation

MartinquaXD commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

How to test

Uh oh!

jmg-duarte Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jmg-duarte Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

squadgazzz left a comment

Choose a reason for hiding this comment

Uh oh!

squadgazzz Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

m-sz Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MartinquaXD commented Oct 31, 2025 •

edited

Loading