Skip to content

Core: Fail scans when position deletes/DVs don't match data file partition#16957

Open
grantatspothero wants to merge 1 commit into
apache:mainfrom
grantatspothero:gn/failTableScansIfCorruptedMetadataFound
Open

Core: Fail scans when position deletes/DVs don't match data file partition#16957
grantatspothero wants to merge 1 commit into
apache:mainfrom
grantatspothero:gn/failTableScansIfCorruptedMetadataFound

Conversation

@grantatspothero

Copy link
Copy Markdown
Contributor

Position deletes/DVs with partition values that differ from referenced data files are invalid and corrupt per iceberg spec:
https://iceberg.apache.org/spec/#scan-planning

A position delete file must be applied to a data file when all of the following are true:
The data file's file_path is equal to the delete file's referenced_data_file if it is non-null
The data file's partition (both spec and partition values) is equal to the delete file's partition

This PR fails table scans when corrupt metadata is detected.

See linked PR for discussion: #16939

@RussellSpitzer

Copy link
Copy Markdown
Member

@amogh-jahagirdar - This is a validation to make sure if we see delete files that are not in the correct partition as their target data files we will fail rather than silently ignore them. Wdyt? Worth the complexity?

…ition

Position deletes and deletion vectors that reference a data file by
path must carry the same partition spec and partition tuple as that
data file. When they differ the metadata is corrupt.
@grantatspothero grantatspothero force-pushed the gn/failTableScansIfCorruptedMetadataFound branch from ba2f990 to df32994 Compare June 24, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants