Skip to content

Goldsky dynamic_table_check race drops events registered shortly before #12

Description

@willemneal

Summary

Events flowing through the v1 Goldsky pipeline can be dropped by the dynamic_table_check filter in transform_4_events_with_name when their emitter contract was sub_reg'd only a few ledgers earlier. This is a race condition between two parallel transform paths in goldsky/v1/index.yaml, not a logic bug, and it's non-deterministic.

Where it happens

  • transform_3_subregistry_events extracts subregistry contract_ids from sub_reg events and writes them to the Postgres-backed v1.registries_dynamic_table.
  • transform_4_events_with_name reads from that same dynamic table via dynamic_table_check('registries_dynamic_table', emitter_contract_id) to keep only events emitted by known registries.
  • These two paths are not synchronized. If transform_3's Postgres write hasn't committed by the time transform_4's check runs, the event is incorrectly dropped.

Observed

In one of our test deployments (source start_at = 2206305):

emitter role register events in raw in registered_contracts
root (CAG5V…) root 7 5 (2 dropped at Δ=0 ledgers)
defindex (CCI6LJW…) subregistry 5 0 (dropped at Δ=2 ledgers)
blend (CDDS…) subregistry 5 5 (passed at Δ=2 ledgers)
soroswap, circle, unverified subregistries 4 total 4 total (passed)

Plus the root's publish at ledger 2206307 (Δ=2 from its sub_reg at 2206305) was also dropped.

Identical Δ=2 ledgers; blend got through and defindex didn't. The race window is variable and appears to sit between ~10 s (where drops are observed) and ~70 s (where events pass reliably). Exact timing is internal to Goldsky's Postgres commit cadence.

Mitigations in place

  1. goldsky/scripts/refresh.sh — issues turbo restart <pipeline> --clear-state, which resets the source checkpoint but preserves the Postgres-backed dynamic table. The replay sees a fully-seeded membership set and the race no longer filters events out.
  2. goldsky/scripts/audit-race.sql — detects events that should have passed the filter but didn't, by comparing v1.raw_events_backup (upstream of the filter) against the downstream sinks, scoped to emitters present in v1.registries.
  3. goldsky/scripts/redeploy.sh --number-of-initial-subregistries N — wires the above together: polls v1.registries_dynamic_table until it has N rows, runs the audit, runs the refresh if drops are found, and re-audits.

Potential longer-term fixes

  • Hardcode root in transform_4's WHERE clause so the root contract's own events don't depend on the dynamic table racing itself at pipeline start.
  • Move the registry-membership filter out of the streaming path into the Postgres view layer. post_init.sql already defines *_with_channel views that inner-join v1.registries; extending that pattern would let sinks carry all candidate events and have the membership filter applied at query time, eliminating the race entirely.
  • Goldsky-side — if Goldsky exposes synchronous-ish semantics for Postgres dynamic tables (e.g. a flush/barrier), use it. Otherwise, continue to rely on the refresh recovery.

To track

  • Decide between hardcode-root, view-layer filter, or accepting the refresh-based recovery as the long-term posture.
  • If we keep the current architecture, wire --number-of-initial-subregistries into whatever runs the deployment flow so the audit+refresh happens automatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions