feat(core): add simulateBundle for testable Jito bundle execution#678
feat(core): add simulateBundle for testable Jito bundle execution#678LeonFedotov wants to merge 6 commits into
Conversation
Closes solana-foundation#670. Adds the read-only counterpart to sendBundle. Where sendBundle is all-or-nothing (every tx must succeed; on full success the sandbox is committed onto the live VM), simulateBundle is fail-fast and the sandbox is always discarded -- successful or not -- so the live VM is byte-identical to its pre-call state regardless of bundle outcome. Implementation reuses the existing BundleSandbox infrastructure: 1. Decode + validate the bundle (size, encoding, pre/post-config length parity, replace_blockhash safety check). 2. clone_for_bundle_sandbox() -- same primitive sendBundle uses. Storages are overlay-wrapped, subscription registries empty, event channels redirected to receivers we hold locally. 3. For each tx: pre-snapshot requested accounts via the sandbox locker -> fetch_all_tx_accounts_then_process_tx_returning_profile_res() to stage tx writes onto the overlay (so the next tx sees them) -> post-snapshot. The internal entry point gives us a KeyedProfileResult directly, which carries logs, units, and error_message -- no need to parse status events. 4. On error_message, fail-fast: stop simulating, mark summary Failed { error: TransactionFailure(sig, msg), tx_signature }, leave subsequent results in the empty/skipped state. 5. Drop the sandbox. Overlay storages, cloned LiteSVM, and buffered Geyser/Simnet events are all reclaimed automatically. Wire types live in a new surfpool-types::jito_bundles module -- vendored byte-for-byte from jito-foundation/jito-solana's rpc-client-api/src/bundles.rs rather than imported, since mainline solana-rpc-client-api (3.1.x) does not export the bundle types. Vendoring keeps the crate self-contained and avoids a Jito-fork-specific dependency. The wire format matches Jito's reference implementation exactly so existing Jito-shaped clients target Surfpool with no JSON adjustments. replace_recent_blockhash is supported (resigning is not -- flag requires skip_sig_verify: true, matching Jito's contract). simulation_bank is accepted for API parity but ignored -- Surfpool always simulates against the working SVM. The per-tx response leaves pre_token_balances, post_token_balances, loaded_addresses, loaded_accounts_data_size, fee, pre_balances, and post_balances as None. This matches Surfpool's existing single-tx simulateTransaction behavior; populating them requires SPL Token layout decoding that hasn't been done on either path. Tracked for a follow-up. Tests cover empty/oversized rejection, missing-context unhealthy, the dangerous-flag combination, pre/post-config length parity, the success-doesn't-mutate-live-VM safety contract, and fail-fast semantics (skip remaining txs after first failure). (fetch_all_tx_accounts_then_process_tx_returning_profile_res on SurfnetSvmLocker is now pub(crate) so this code can drive it directly without spinning up a status channel only to read back the metadata path it already returns.)
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for Jito’s simulateBundle JSON-RPC method, including wire types, server-side implementation, and tests to validate bundle simulation behavior (size limits, config validation, isolation from live VM).
Changes:
- Added
surfpool_typeswire structs/enums for Jito bundle simulation (simulateBundle) and re-exported them from the types crate. - Implemented
simulateBundlein the Jito RPC module using an isolated sandbox SVM clone, plus helpers for address parsing and account snapshot encoding. - Extended test suite to cover validation, success (no live-state mutation), and fail-fast semantics.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/types/src/lib.rs | Exposes the new jito_bundles module and re-exports its public types. |
| crates/types/src/jito_bundles.rs | Introduces serde-friendly wire types/enums for simulateBundle requests/results. |
| crates/types/Cargo.toml | Adds dependencies required by the new wire types (commitment-config, rpc-client-api, thiserror). |
| crates/core/src/surfnet/locker.rs | Widens visibility of an internal processing helper to support bundle simulation. |
| crates/core/src/rpc/jito.rs | Adds simulateBundle RPC implementation, helper functions, and comprehensive tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR implements the
Confidence Score: 5/5Safe to merge. The sandbox is always discarded regardless of outcome, so the live VM is never mutated, and the previously flagged issues have all been corrected. All three previously identified issues are fixed: VerificationFailure now maps to SignatureFailure rather than SanitizeFailure, None config entries correctly produce None (not Some([])) in the response, and status channels are created per-iteration with no cross-iteration accumulation. The implementation reuses the proven BundleSandbox infrastructure from sendBundle, the 11 new tests cover the key behavioral contracts (isolation, fail-fast, typed errors, null vs empty accounts), and the vendored wire types match Jito's reference schema. No new issues were found in this review pass. No files require special attention. Important Files Changed
Reviews (6): Last reviewed commit: "review: address Copilot's third-pass fin..." | Re-trigger Greptile |
Three load-bearing fixes raised by both reviewers, plus a few quality
improvements they each flagged. Each fix is pinned by a new test so the
regression cannot reappear silently.
P1: typed TransactionError instead of SanitizeFailure catch-all
----------------------------------------------------------------
Both greptile and copilot caught that we were assigning
TransactionError::SanitizeFailure to err on every failure path, which
specifically means "the tx failed sanitization before execution" and is
NOT a generic carrier. Clients that switch on the typed err variant
would route every InsufficientFunds, InvalidAccountIndex, etc. into
the SanitizeFailure branch.
The locker emits the typed error via TransactionStatusEvent (Simulation
Failure / ExecutionFailure / VerificationFailure) when do_propagate is
true. The bundle sandbox's signature/logs subscriber registries were
already emptied by clone_for_bundle_sandbox so propagating to the
status channel fires zero subscriber notifications. Switched the loop
to do_propagate=true with a fresh status_rx per iteration; the receiver
is dropped at end-of-iteration so events never accumulate.
VerificationFailure is the one variant that legitimately maps to
SanitizeFailure (no typed error available). Documented inline.
Pinned by `test_simulate_bundle_propagates_typed_execution_error`.
P1: pre/post_execution_accounts None vs Some(empty) wire shape
---------------------------------------------------------------
greptile: when pre_execution_accounts_configs[i] is null, the response
field must serialize as null (Jito's reference). We were returning
Some(vec![]) — ambiguous with "asked for accounts, got an empty list".
Track whether the config entry was Some up front and gate the wrapper
accordingly. Empty-list-explicit (Some with empty addresses vec) still
returns Some([]); only the null-config case becomes None.
Pinned by `test_simulate_bundle_null_account_configs_yield_none_not_empty_array`
+ `test_simulate_bundle_empty_addresses_yield_some_empty_vec` (the
adjacent positive case).
P1: replacement_blockhash.last_valid_block_height was a slot
-------------------------------------------------------------
copilot: we populated last_valid_block_height with
get_latest_absolute_slot(), which is a slot, not a block height. The
existing single-tx simulateTransaction path uses
latest_epoch_info().block_height — switched simulate_bundle to match.
P2: tx_signature reports zero-sig when tx had no signatures
------------------------------------------------------------
copilot: VersionedTransaction.signatures.first() can be None.
unwrap_or_default() emits the all-zero Signature, then the response
sets tx_signature: Some("11111...111"), which is bogus and contradicts
the Option<String> contract ("if known"). Track signature as
Option<Signature>; tx_signature: signature.map(|s| s.to_string()).
P2: snapshot_accounts uses canonical missing-account placeholder
-----------------------------------------------------------------
copilot: solana_account::Account::default() emits an all-zero owner
pubkey, not system_program::id() — the comment was wrong and the value
was confusing. Construct the placeholder explicitly with owner =
system_program::id(); reword the comment to match. Existing simulate
Transaction path encodes missing accounts the same way.
P2: jito_bundles wire doc says base64-only, impl now enforces it
-----------------------------------------------------------------
copilot: doc claimed "Only Base64 is supported" but the impl accepted
any encoding into_binary_encoding() handles. Tightened the impl to
reject non-base64 with `invalid_params` (matches Jito's reference) and
clarified the wire-type doc to explain the enforcement.
Pinned by `test_simulate_bundle_rejects_non_base64_encoding`.
P3: per-iteration status channel + factored result builder
-----------------------------------------------------------
greptile: a single shared (status_tx, _status_rx) across all tx
iterations accumulates buffered events that nobody consumes. Replaced
with a fresh channel per iteration (also necessary to recover the typed
error per tx — the P1 fix depends on this).
copilot: RpcSimulateBundleTransactionResult was constructed inline in
3+ branches with duplicated field sets. Factored a `build_tx_result`
helper that accepts only the fields we populate; the perpetually-None
fields (token balances, loaded addresses, etc. — same gap the existing
single-tx simulate has) live in one place now.
Test count: 22 -> 26 (4 new regression pins, all passing). The new
tests for null-vs-empty configs, base64 enforcement, and typed-err
propagation would each have caught the corresponding regression.
|
Thanks both — pushed
Each P1 fix is pinned by a new test: Total: 22 → 26 jito tests, all passing. The full diff is +393/-91 across the two files I already touched (no new files). Let me know if anything else needs attention. |
simulateBundle JITO RPC methodGreptile flagged that the VerificationFailure → SanitizeFailure mapping in the typed-error recovery is semantically wrong. SanitizeFailure means the tx was structurally malformed; for a sig-verify failure the right variant is SignatureFailure (matches svm::sigverify and the single-tx simulate path in full.rs). Two changes: 1. The typed-error recovery match in the Ok(keyed) arm now maps VerificationFailure → SignatureFailure. Defensive — in our flow sigverify failures bubble up via Err(e) before any event is emitted, but if a future locker change starts emitting on this path the typed err is now correct. 2. The Err(e) arm previously had no way to recover the typed err on a sig-verify failure (the locker doesn't push to the status channel when sigverify fails — it returns Err directly via ?). Pre-pass sigverify in the bundle loop ourselves when sigverify is on, capture the typed FailedTransactionMetadata.err directly, and plant it into transaction_results[idx].err. The inner call is then run with sigverify=false to avoid duplicate work on the hot path. Pinned by a new regression test test_simulate_bundle_propagates_typed_signature_failure: builds a valid v0 transfer tx, corrupts the first signature byte (post-sign), and asserts the typed err is exactly SignatureFailure. 26 → 27 simulate_bundle tests, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @greptile-apps — you were right. Pushed The issue. Two changes (one for the variant, one for actually getting it surfaced):
Pinned by a new test: Builds a valid v0 transfer, corrupts the first signature byte (post-sign), and asserts the typed err is exactly Diff: |
|
Both changes are exactly right. The pre-pass sigverify approach in the The Test coverage is solid: corrupting a post-sign byte and asserting the exact variant is the right way to pin this — it would catch any regression where the error gets swallowed back to LGTM on Tip: You can customize Greptile's behavior for this repo with |
Three actionable findings from Copilot's re-review of 2b51cc0: 1. **Sig-less tx synthesizes zero-sig in TransactionFailure** (jito.rs:849) When a tx had no signatures we set tx_signature: None correctly, but still injected a zero-byte default Signature into RpcBundleExecutionError::TransactionFailure(Signature, _) — clients keying off the sig inside `error` would be misled. Fix: reject sig-less transactions up front at decode time with invalid_params ("Bundle transaction N has no signatures"). The validator rejects sig-less txs at ingest, so this matches Solana's wire contract. Once rejected upstream, the per-tx signature is no longer Option — we simplify to direct Signature throughout the loop and drop unwrap_or_default() / .map() noise. Pinned by test_simulate_bundle_rejects_sigless_tx. 2. **RpcSimulateBundleConfig couldn't be partial** (jito_bundles.rs:75) The struct required pre_execution_accounts_configs and post_execution_accounts_configs to be present in JSON whenever a config object was provided. Docstring suggested partial configs (just skip_sig_verify, etc.) should be allowed. Fix: add #[serde(default)] to both vec fields. Server treats an omitted/empty vec as 'no snapshots requested for any tx' — equivalent to vec![None; bundle_len]. Mismatched non-empty lengths are still rejected. Pinned by test_simulate_bundle_accepts_partial_config_omitting_account_configs. 3. **return_data 'when present' doc lied** (jito.rs:1083) The wire-type docstring listed unpopulated fields but missed return_data, fee, and pre/post_balances — implementation always sets them None. Updated both the wire-type docstring on RpcSimulateBundleTransactionResult and the build_tx_result comment to list every uniformly-None field, with a note that closing the gap requires piping richer metadata through ProfileResult and is tracked for a follow-up PR. Wire shape unchanged. 27 → 29 jito tests, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 2: Copilot's second-pass review (
27 → 29 jito tests, all passing. |
Critic flagged the synthetic SurfpoolError trampoline as the one bit of overengineering in c1e5aa1. Flatten it. Before: pre-pass sigverify captured typed err into Option, then constructed Err(SurfpoolError::from(typed.clone())) to drive the Err(e) arm, which then re-checked sigverify_err.is_some() to plant the typed err. Two layers of indirection and a verbose 'Internal error: ...' message in the bundle summary. After: sigverify failure → direct early-out. Snapshot post-state, build result with the typed err planted directly, set the summary, break. The Err(e) arm goes back to its single recovery path (status_rx only). Net -7 lines, simpler control flow, cleaner summary error message (typed.to_string() = 'signature verification failed' instead of 'Internal error: ...'). Also tightened two stale comments per critic S2/S3: - The Ok(keyed) arm's VerificationFailure mapping comment now explicitly states it's unreachable (sigverify=false on the inner call means the locker's sigverify gate cannot fire), kept for exhaustiveness. - The Err(e) arm's preamble dropped the 'sigverify case' branch description since the case is no longer reachable here. 29/29 jito tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
crates/core/src/rpc/jito.rs:1
- The
typed_errrecovered from the status channel is ignored unlessprofile.error_messageisSome. If the locker emits a failure event butprofile.error_messageis unexpectedlyNone, this code will report the tx as successful (err: None) and continue, which is incorrect. Consider treating the tx as failed if eitherprofile.error_message.is_some()ORtyped_err.is_some(), and ensure summary/message are populated accordingly (e.g., fall back totyped_err.to_string()when the string message is missing).
use std::sync::Arc;
crates/core/src/rpc/jito.rs:1
tx.clone()in the hot simulation loop can be avoided sincedecoded_txsis not used after the loop. Iterating withinto_iter()(or otherwise moving eachVersionedTransaction) would eliminate per-tx cloning and reduce allocation/copy overhead for larger transactions.
use std::sync::Arc;
| #[derive(Error, Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] | ||
| pub enum RpcBundleExecutionError { | ||
| #[error("The bank has hit the max allotted time for processing transactions")] | ||
| BankProcessingTimeLimitReached, | ||
| #[error("Error locking bundle because a transaction is malformed")] | ||
| BundleLockError, | ||
| #[error("Bundle execution timed out")] | ||
| BundleExecutionTimeout, | ||
| #[error("The bundle exceeds the cost model")] | ||
| ExceedsCostModel, | ||
| #[error("Invalid pre or post accounts")] | ||
| InvalidPreOrPostAccounts, | ||
| #[error("PoH record error: {0}")] | ||
| PohRecordError(String), | ||
| #[error("Tip payment error: {0}")] | ||
| TipError(String), | ||
| #[error("A transaction in the bundle failed to execute: [signature={0}, error={1}]")] | ||
| TransactionFailure(Signature, String), | ||
| } |
| // notice rather than seeing a SignatureFailure mid-bundle. | ||
| if replace_recent_blockhash && !skip_sig_verify { | ||
| return Err(Error::invalid_params( | ||
| "replace_recent_blockhash cannot be used with !skip_sig_verify", |
| use jsonrpc_core::{BoxFuture, Error, Result}; | ||
| use jsonrpc_derive::rpc; | ||
| use sha2::{Digest, Sha256}; | ||
| use solana_account_decoder::{UiAccount, UiAccountEncoding, encode_ui_account}; |
Two actionable findings from Copilot's re-review of 1358063: 1. **Rust-y error messages with snake_case field names** (jito.rs:638) The 'replace_recent_blockhash cannot be used with !skip_sig_verify' error mixed Rust boolean syntax (`!skip_sig_verify`) and snake_case field names — confusing for JSON-RPC callers who think in camelCase. Fix: 'replaceRecentBlockhash requires skipSigVerify=true (replacing the blockhash invalidates pre-existing signatures)'. Same camelCase treatment applied to the pre/post_execution_accounts_configs length-mismatch message. Test assertion updated to match the new wording. 2. **tx.clone() in the simulation hot loop** (jito.rs:812) decoded_txs isn't used after the loop, so iterating with into_iter() removes the per-tx VersionedTransaction clone. Trivial savings on short bundles, more meaningful on 5-tx bundles with large messages. The sigverify call still takes &tx (borrow before move). Two findings dismissed (with rationale): - **RpcBundleExecutionError missing #[serde(rename_all = camelCase)]**: Verified against Jito's reference (jito-foundation/jito-solana rpc-client-api/src/bundles.rs) — RpcBundleExecutionError there ALSO lacks the rename_all directive. Variants like BankProcessingTimeLimitReached are PascalCase in Jito's wire format too. Adding rename_all would diverge from Jito's reference, not converge with it. - **UiAccount/UiAccountEncoding from solana_account_decoder vs solana_account_decoder_client_types**: Verified that every other rpc/* module in surfpool (full.rs, ws.rs, accounts_data.rs, surfnet_cheatcodes.rs) imports from solana_account_decoder. Switching jito.rs to _client_types would break the established convention for hypothetical-future divergence; types crate already imports from _client_types because it cannot pull in the heavier solana_account_decoder. The two are kept in sync upstream. Two suppressed low-confidence comments dismissed: - typed_err vs error_message ordering: locker emits SimulationFailure / ExecutionFailure events ALONGSIDE setting error_message, atomically in the same handle_*_failure function. The branch 'error_message: None && typed_err: Some' is impossible. - See above for tx.clone() — that one was correct and applied. 29/29 jito tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Round 3: Copilot's third-pass review ( Applied:
Dismissed with rationale:
Two suppressed low-confidence comments:
29/29 jito tests still pass. |
What this adds
Implements the
simulateBundleJito RPC method — the read-only counterpart tosendBundleand the missing third leg of Surfpool's Jito RPC surface (sendBundleshipped in #634,getBundleStatusesin #660). With this, anyone whose product accepts Jito bundles in production can now exercise that code path against Surfpool in tests without committing state to the live VM.Why it matters
Bundle execution is a separate code path in any product that supports it (MEV-aware backends, atomic-multi-tx wallets, swap aggregators, monitoring/risk tooling). Without
simulateBundleon Surfpool, that code path has no local-test option:solana-test-validatordoesn't implement it.sendBundleworks locally but commits, so tests can't be re-run from a clean baseline.simulateTransactionis single-tx, which can't model bundles where TX 2 depends on TX 1's writes.simulateBundlecloses that gap: every dependent-tx scenario you'd run via Jito in production is now reproducible in tests with byte-identical request/response shape.How it works
Reuses the existing
BundleSandboxinfrastructure (built forsendBundlein #634):replace_recent_blockhashrequiresskip_sig_verify.clone_for_bundle_sandbox()produces aBundleSandboxwith overlay-wrapped storages and emptied subscription registries; event channels redirect to receivers held locally.pre_execution_accounts_configs[i], drive the tx throughfetch_all_tx_accounts_then_process_tx_returning_profile_res()(same internal entrysendBundle/profile_transactionuse, so writes stage on the overlay and TX N+1 sees TX N's effects), post-snapshot, recover the typedTransactionErrorfrom the per-iteration status channel.sendBundlesimulateBundleWire types — vendored, not imported
The Jito bundle types (
RpcSimulateBundleConfig,RpcSimulateBundleResult,RpcSimulateBundleTransactionResult,RpcBundleSimulationSummary,RpcBundleExecutionError,SimulationSlotConfig,RpcBundleRequest) live only in jito-foundation/jito-solana's fork ofsolana-rpc-client-api, not in mainlinesolana-rpc-client-api(3.1.x).Vendoring them in a new
surfpool-types::jito_bundlesmodule keeps the workspace self-contained, avoids a Jito-fork-specific dependency, and pins the wire format byte-for-byte to Jito's reference. Existing Jito-shaped clients target Surfpool with no JSON adjustments. If Jito ever publishes mainline crates for these types, the vendored definitions can swap to imports later.Behavior parity with Jito's reference
MAX_BUNDLE_SIZE(5).invalid_params.replace_recent_blockhash + !skip_sig_verifyrejected withinvalid_params(resigned blockhash invalidates pre-existing signatures).simulation_bankaccepted for API parity (Surfpool always simulates against the working SVM).replacement_blockhashpopulated whenreplace_recent_blockhash: true, with the bank'slast_valid_block_height(not slot).errcarries the typedTransactionErrorrecovered from the SVM's status events.pre/post_execution_accountsreturnsnullwhen the corresponding config entry isnull,Some([])only when the addresses list was explicitly empty.Known gap (out of scope)
The per-tx response leaves these as
None:pre_token_balances,post_token_balancesloaded_addressesloaded_accounts_data_sizefee,pre_balances,post_balancesThis matches Surfpool's existing single-tx
simulateTransactionbehavior —get_simulate_transaction_resultinfull.rsalso hardcodes those toNone. Populating them requires SPL Token Account byte-layout decoding that hasn't been done on either path yet. Worth a follow-up that touches both code paths together.Tests
11 tests total in
crates/core/src/rpc/jito.rs::testscovering simulateBundle (the suite went from 15 jito tests → 26 with this PR):test_simulate_bundle_empty_rejectedinvalid_paramstest_simulate_bundle_exceeds_max_size_rejectedMAX_BUNDLE_SIZE→invalid_paramstest_simulate_bundle_no_context_returns_unhealthyNodeUnhealthytest_simulate_bundle_replace_blockhash_requires_skip_sig_verifytest_simulate_bundle_pre_post_lengths_must_matchtest_simulate_bundle_rejects_non_base64_encodingtest_simulate_bundle_succeeds_does_not_mutate_live_vmtest_simulate_bundle_failure_marks_summary_and_skips_remainingFailed, signature populatedtest_simulate_bundle_propagates_typed_execution_errorerris the SVM's typedTransactionError, not aSanitizeFailurecatch-alltest_simulate_bundle_null_account_configs_yield_none_not_empty_arrayNonein response (notSome([]))test_simulate_bundle_empty_addresses_yield_some_empty_vecSome([])(distinguishes from null)Local:
cargo test -p surfpool-core --lib 'rpc::jito::tests'→ 26 passed, 0 failed.Minor adjacent change
SurfnetSvmLocker::fetch_all_tx_accounts_then_process_tx_returning_profile_reswas bumped from private topub(crate). The internal API was already used byprocess_transaction(line 1133) andprofile_transaction(line 1192);simulate_bundleis the third caller. Lets us drive simulation directly without spinning up a status channel only to read back the metadata path it already returns. No external API surface change.Real-world test of the loop
End-to-end exercised against a downstream product that calls
simulateBundlein production: backend → the same Helius-shape simulation pipeline that hits this method on prod-Helius → Surfpool with this branch built as a Docker image. All bundle-tx test scenarios pass identically in the Surfpool-backed test rig. Happy to share the e2e fixtures if useful.🤖 Generated with Claude Code