v2: rewrite by stegaBOB · Pull Request #125 · otter-sec/solana-verified-programs-api

stegaBOB · 2026-05-22T21:59:29Z

v2 rewrite. Same api/src/... directory layout as master so per-file diffs read in place; #126 stacks on top to flatten, #127 adds /resolve-hash, #128 adds integration tests.

Schema

solana_program_builds (job) + verified_programs (result) merge into one builds table per attempt, with executable_hash, error_message, and a build_status ENUM. verified_programs.on_chain_hash + program_authority merge into one program_state row per program; is_verified is now a derived expression. program_state also gains a pending_reverify flag with no v1 equivalent: the sweep sets it when a program's on-chain hash drifts and drains it by reverifying (capped per cycle). build_logs.program_address renamed program_id; id retyped VARCHAR -> UUID.

Migration runs inline at startup via db.migrate(), idempotent. It copies the v1 data into the new tables but retains them -- the drop is deferred to a follow-up so the cutover stays reversible (a bad deploy can be rolled back by redeploying the v1 binary instead of restoring from backup). See the deploy note at the end of 0001_init.sql.

Runtime

/status/{program_id} was Redis-cached (5min TTL) with a per-request RPC call to fetch program authority on cache miss. v2 caches in-process via moka (invalidated on every write) and is DB-only at request time. One RpcClient, no rotation; rpc_manager.rs gone.

State mutation moves off the read path entirely. v1's check_is_verified spawned reverify_program in the background on hash mismatch — a full solana-verify Docker build kicked off from a read endpoint. v2's check_is_verified is strictly read-only; rebuilds fire from write paths and the background sweep, never from a read:

/pda webhook: refetch hash, unverify if changed, rebuild against fresh PDA params.
/unverify webhook: record new hash, no rebuild (PDA still points at old commit).
Sweep (default 5min): refreshes program_state via batched getMultipleAccounts, and reverifies programs whose on-chain hash drifted (capped per cycle, skipping params already built) -- the backstop for upgrades a webhook missed.
/verify-with-signer: manual fallback when a webhook is missed and the developer notices stale /status.

Trade-off: between a missed webhook and the next sweep tick, /status can show is_verified=true against stale metadata, bounded by sweep_interval_seconds -- the sweep then reverifies the drifted program and self-heals. Previous behavior had the inverse trade-off: every drifted-program cache miss triggered a Docker build whether anyone needed it or not.

Other

sqlx in place of Diesel; solana-sdk umbrella -> split crates.
Config + RPC injected via AppState instead of global Lazys.
Rust 1.93 -> 1.94, axum 0.6 -> 0.8.
Existing deploys migrate by deploying v2 -- no db-init container. Migrations run at startup, so take a DB backup and deploy in a maintenance window; the v1 tables are kept for rollback and dropped via a manual follow-up once v2 is healthy.

Perf

Local bench, 336 seeded programs, 4-cores, rate limits dropped.

/status/{program_id} warm path:

concurrency	legacy (Redis hit)	v2 (moka hit)
100	~39k rps, p50=2.4ms	~57k rps, p50=1.6ms
500	~36k rps, p50=13ms	~49k rps, p50=10ms
1000	~27k rps, p50=36ms	~47k rps, p50=20ms

/verified-programs-status: v1 timed out at 60s+ (per-program fan-out); v2 ~5ms.
/status cold path: v2 ~218ms; v1 worst case seconds.

meumar-osec · 2026-05-26T09:15:27Z

0001_init migrates the v1 data and then drops the old tables. The problem is that if something breaks, we’d be restoring from backup instead of just rolling back the binary, and migrations now run automatically at app startup. Can we add a short deploy note (backup first, maintenance window)? Confirm this has been tested against a prod snapshot?

meumar-osec · 2026-05-26T09:24:08Z

It’d be good to have at least some tests for the migration path (v1 -> v2) and a few /status + helius /unverify cases, since the current CI is mostly cargo test on a dummy DB and may not catch regressions from this rewrite.

Dropping verified_programs / program_authority / solana_program_builds in 0001 -- which runs automatically at app startup -- made a bad v2 deploy unrecoverable without restoring from backup. Keep the v1 tables so the data survives the cutover (a rollback to the v1 binary still has it), and document the deploy steps and the deferred manual drop in the migration file itself. Addresses review feedback on otter-sec#125. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

stegaBOB · 2026-06-02T16:03:41Z

@meumar-osec updated the PR descriptions and opened the remaining PRs in the stack (#126, #127 , #128). They're still targeting main in order to get the CI to run properly.

I updated the migration to retain the old tables instead of deleting, so worst case we should be able to rollback still. I also added a bunch of integration tests in #128 , including for the migration itself. Let me know if theres anything else you'd like to change! I'd be happy to hop in a call and walk you through the changes if you'd like too.

Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Leftover from v1's global Lazy pattern; v2 injects config via AppState. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

The handler used to do two sequential queries — `get_program_state` for the cached on-chain hash + frozen/closed flags, then `best_build` for the matching completed build — and stitch the result through `VerificationResponse::from_state_and_build`. Replace both with one `LEFT JOIN LATERAL` that returns just the seven columns the response needs, render the `ExtendedStatusResponse` inline, and serialize once. Drops `best_build` (its only caller is gone) and shapes the handler to send the pre-encoded body via `(StatusCode, [(CONTENT_TYPE, "application/json")], json)` so axum doesn't re-encode anything. One round-trip per `/status` request, no struct→JSON shuffle, same external response shape. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Adds a moka cache (`Cache<Address, String>`) of pre-serialized `/status` bodies. Hits skip the LATERAL join and the JSON encode; misses run the existing path and write back the rendered body. Every mutating method invalidates the affected program's entry: - `upsert_program_state` (sweep + post-build snapshot) - `unverify_program` (upgrade webhook) - `mark_closed` (close webhook + sweep) - `mark_build_completed` (build finished — added `program_id` arg so we have the key without an extra lookup) TTL is bound to the sweep interval. Every sweep cycle upserts every `program_state` row, which evicts the matching cache entry — so a longer TTL would never fire, and a shorter one just adds DB load between sweeps. Capacity capped at 10k entries (LRU eviction beyond that, which the verified-program set will not reach in any realistic timeframe). Restart self-heals: the cache is empty on boot, the sweep job fires immediately at startup and refreshes `program_state` from chain before the first cached entry is written. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Replaces the manual get → DB query → insert pattern with `try_get_with`. moka's recommended path for "fetch or compute" coalesces concurrent calls on the same cold key into one computation — without it, 100 concurrent /status requests for an uncached program would each run the LATERAL join independently before any could write back. Same external behaviour, fewer DB hits under contention. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

`check_is_verified` (/status) filters builds to trusted signers, but `get_verification_status_all` (/verified-programs-status) didn't -- it returned any completed build's repo/commit, so an untrusted signer could surface its metadata there. v1 routed this endpoint through check_is_verified per program, so it was trust-filtered; this restores that. Extracts `onchain::trusted_signers()` (system_program::ID + SIGNER_KEYS) now that two callers need it, and uses it in both. The live upgrade authority stays matched in SQL via program_state.authority. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

NewBuild::from(&OtterBuildParams) left signer None, so every caller (both verify handlers, pda_worker) had to set it afterward -- a footgun where forgetting yields a null signer. The PDA is derived from its signer and the on-chain program stores that same key in the params, so it's authoritative: From now sets it directly. Drops the redundant overrides and the separately-threaded signer: setup_verification returns just the NewBuild (no more VerificationSetup struct), and process_verification / process_verification_sync lose their signer parameter. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

The sweep now backstops missed /pda webhooks. When a refresh observes a program's on-chain hash change, upsert_program_state flags pending_reverify; the sweep then drains up to max_reverifies_per_sweep flagged programs (config, default 3), fetches each one's current Otter Verify PDA, and kicks a build through the same execute path as the verify endpoints -- unless an identical build already exists (any status, so failures aren't retried). The flag persists across cycles, so a capped burst drains over several sweeps rather than being dropped, and is cleared once a program is handled so stuck programs aren't re-examined until they drift again. unverify_program also sets the flag, since it advances the stored hash itself and the sweep's drift check would otherwise miss it. Adds the pending_reverify column to program_state, a has_build_for_params guard (any-status sibling of find_duplicate, sharing one query via an include_failed toggle), and the AppState plumbing to give the sweep loop what execute needs. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Two small correctness fixes carried over from the v2 rewrite: - Hash the program-data bytecode when the account is exactly the header size (`>=` not `>`), matching `solana-verify`'s `data.get(45..)`. Only affects a zero-bytecode account, but it was an off-by-one. - Trim the `/verified-programs` search term before building the ILIKE pattern, so a space-padded (but valid) query still matches. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Two v1-parity fixes for behavior the rewrite changed silently: - /unverify: skip programs with no completed build. Helius watches every program upgrade, so the handler fires for programs we never verified; the upsert in unverify_program would otherwise create a junk program_state row for each (and the sweep would then chase it). v1 bailed here implicitly -- its get_verified_build errored for unknown programs and its UPDATE was a no-op -- so restore that gate explicitly. - /verified-programs and /verified-programs-status: stop excluding frozen programs (keep excluding closed). /status already reports a frozen-but-matching program as verified, so the lists now agree. Immutable (legacy-loader) programs are frozen and shouldn't be hidden from the directory. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Return None for a program with no program_state row (or NULL hash) rather than an empty string. The "" sentinel made "untracked" and "tracked, hash drifted" indistinguishable, which different callers want to treat differently. Each caller now handles None explicitly: - pda_worker / unverify: None (no cached hash) compares unequal to a real fetched hash, same as before -- behavior unchanged. - job_status: falls back to "" for the response field. Pure refactor; no behavior change. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Dropping verified_programs / program_authority / solana_program_builds in 0001 -- which runs automatically at app startup -- made a bad v2 deploy unrecoverable without restoring from backup. Keep the v1 tables so the data survives the cutover (a rollback to the v1 binary still has it), and document the deploy steps and the deferred manual drop in the migration file itself. Addresses review feedback on otter-sec#125. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch 27 times, most recently from 013145d to 6a9c2c6 Compare May 24, 2026 05:16

stegaBOB changed the title ~~(WIP) v2 rewrite~~ v2 rewrite + resolve_hash endpoint May 24, 2026

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch 2 times, most recently from c4d335d to a623a58 Compare May 24, 2026 19:04

stegaBOB changed the title ~~v2 rewrite + resolve_hash endpoint~~ v2 rewrite May 24, 2026

stegaBOB mentioned this pull request May 24, 2026

(stacked) v2: flatten api/ to repo root #126

Draft

stegaBOB changed the title ~~v2 rewrite~~ v2: rewrite May 24, 2026

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch from a623a58 to c055e29 Compare May 25, 2026 19:41

meumar-osec reviewed May 26, 2026

View reviewed changes

Comment thread api/src/db.rs

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch from 7b70bcb to 50c0131 Compare May 29, 2026 00:04

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch from 7aac517 to ed86bbd Compare June 1, 2026 23:20

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch from ed86bbd to c7f92e7 Compare June 1, 2026 23:23

meumar-osec approved these changes Jun 3, 2026

View reviewed changes

claude and others added 13 commits June 3, 2026 12:08

v2: rewrite api in upstream directory layout

32d7f42

Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

ci: drop unused once_cell dependency

a649591

Leftover from v1's global Lazy pattern; v2 injects config via AppState. Co-Authored-By: stegaBOB <41593264+stegaBOB@users.noreply.github.com>

Use trusted signers for check is verified

e444367

stegaBOB force-pushed the stegaBOB/feat/v2-migration branch from c7f92e7 to 8e75e22 Compare June 3, 2026 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2: rewrite#125

v2: rewrite#125
stegaBOB wants to merge 13 commits into
otter-sec:masterfrom
solana-mobile:stegaBOB/feat/v2-migration

stegaBOB commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

meumar-osec commented May 26, 2026

Uh oh!

meumar-osec commented May 26, 2026

Uh oh!

stegaBOB commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stegaBOB commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Schema

Runtime

Other

Perf

Uh oh!

Uh oh!

meumar-osec commented May 26, 2026

Uh oh!

meumar-osec commented May 26, 2026

Uh oh!

stegaBOB commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stegaBOB commented May 22, 2026 •

edited

Loading

stegaBOB commented Jun 2, 2026 •

edited

Loading