Skip to content

Detect cases where app resumes ingestion at an invalid offset after backup restore #932

Open
@canton-network-da

Description

@canton-network-da

[ This issue was auto-migrated from DA's internal repo (DACH-NY/canton-network-node#14953). Original author: @rautenrieth-da ]

What is this about?

After restoring a participant, app databases need to be restored to an earlier point, as participant offsets are not deterministic.

If the app operator fails to do so, the app will resubscribe at the last ingested offset, but the participant will return unexpected updates for that offset.

The DbMultiDomainAcsStore.ingestionSink could attempt to detect this:

  • Upon resuming the ingestion, it could re-download the update for the last ingested offset and compare it with the persisted data
  • It could keep track of the last record time for each domain (see getRecordTimeRange in UpdateHistory), and complain if it encounters an update with an earlier time

Both checks would still allow for false negatives, and would not prevent wrong restores, but might prevent corrupt databases and speed up debugging.

How important is this and why?

Misconfigured backups happen to the best:

  • DACH-NY/canton-network-node#14951
  • on December 2nd 2024, GSF seemingly messed up a node reset on devnet, resulting in a corrupted scan app database which wasn't discovered until two weeks later

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions