API, Core: Add ValidateRewriteTablePath action interface#16967
Open
Priyadarshini-Mitra wants to merge 1 commit into
Open
API, Core: Add ValidateRewriteTablePath action interface#16967Priyadarshini-Mitra wants to merge 1 commit into
Priyadarshini-Mitra wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposal #16868
This PR adds the
ValidateRewriteTablePathaction interface that verifies every metadata, data, and delete file the source table references is present at the destination. The most common pairing is withRewriteTablePath, but the action is designed to operate against any source/destination pair where the destination is supposed to be a copy of the source.The source table is the source of truth: implementations walk source metadata to enumerate expected files, apply any configured prefix rewrite, and check each path at the destination. Files referenced by the source but missing at the destination are reported regardless of whether the destination's own metadata is internally consistent — the validator does not rely on the destination metadata to enumerate expected files.
This is kept as a separate action, not folded into
RewriteTablePath, so validation can run independently of any copy producer — against any source/destination pair, regardless of how the destination was produced.We are splitting implementation into three sequential PRs :
This PR establishes the API contract only.
Two common uses (target use cases for the implementation):
Verify a copy before registering it — after producing a destination via
RewriteTablePath, run this action to confirm the destination contains every expected file before registering the new metadata.json with a catalog. A failed validation prevents an incomplete copy from becoming a live table.Audit any source/destination pair for sync — the destination does not have to have been produced by
RewriteTablePath. This works against DR replicas, migration targets, backups, Distcp output, manual file copies, or any other out-of-band copy. Run this action to confirm the destination still references every file the source does — catching missing or stale copies before they cause data loss on failover, wrong query results from a migrated table, or failed restores. Requires loadable source and destination Tables plusrewriteLocationPrefix(...)to map source paths to destination paths. Choose the mode based on what's being audited:destinationSnapshotId(the destination's snapshot id from the previous successful copy/validation) for an incremental check that validates only the delta accumulated since then.validateFullTable(true)for an audit-from-scratch that re-validates every source file at the destination regardless of prior state.What this PR adds
ValidateRewriteTablePath,org.apache.iceberg.actions): fluent builder for source/destination metadata version, source/destination snapshot id, prefix rewrite (rewriteLocationPrefix(String, String)orrewriteLocationPrefix(Map<String, String>)— entries accumulate, applied longest-prefix-match),validateScope(ALL/LATEST), andvalidateFullTableoverride.BaseValidateRewriteTablePath): Immutables-basedResultimplementation with derivedisValid(),missingFileCount(), andvalidationSummary()fields.ActionsProvider#validateRewriteTablePath(Table)factory method.The action operates in three modes selected by which parameters are configured: backfill (source only), incremental (source plus destination), and forced full validation via
validateFullTable(true). Implementation lands with the next PR.Constraints
HasTableOperationsto resolve metadata file locations.validateScope(ALLvsLATEST) affects backfill mode only; incremental mode always computes the source-vs-destination snapshot diff.Out of scope (intentional)
rewriteLocationPrefix(Map<String, String>)setter accepts multiple source→destination mappings (resolved via longest-prefix match) once the implementation lands, butRewriteTablePathstill rewrites with a single source/target pair. Extending the copy-side action to accept a Map is a planned follow-up.