Skip to content

HIVE-28987: Iceberg: A faulty query predicate can compromise transaction isolation #5839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 18, 2025

Conversation

deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Jun 2, 2025

What changes were proposed in this pull request?

Enable conflict detection filter only when plan contains 1 TableScan Operator for a table

Why are the changes needed?

BugFix

Does this PR introduce any user-facing change?

No

How was this patch tested?

Cluster + locally
https://issues.apache.org/jira/secure/attachment/13076866/iceberg_atomic_merge_update.q

@deniskuzZ
Copy link
Member Author

deniskuzZ commented Jun 6, 2025

@simhadri-g, do you recall we have you combined using and predicates from all the TableScan operators in order to construct the final iceberg commit filter?

@deniskuzZ
Copy link
Member Author

@kasakrisz, @zabetak isn't it possible to find a top level table scan operator for a specific table based on DAG plan?

@simhadri-g
Copy link
Member

redicates from all the TableScan operators in order to construct the final iceberg commit filter?

Yes, this was the same logic spark used to build the conflict detection filter as well.
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L399-L410

@deniskuzZ
Copy link
Member Author

deniskuzZ commented Jun 9, 2025

redicates from all the TableScan operators in order to construct the final iceberg commit filter?

Yes, this was the same logic spark used to build the conflict detection filter as well. https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L399-L410

Thanks @simhadri-g , I have no idea how Spark manages/constructs the filterExpressions, but based on the code, it seems Spark has a single TS operator (root TS ???).
Maybe you can check with @vrozov and help us do the same in Hive.

scan.filterExpressions()

In the current PR, since I don't have any better ideas how to identify root TS and construct the final filterExpressions, I am disabling the conflict detection filter when DAG has multiple TS operators for the write table.
cc @okumin

Copy link
Contributor

@kasakrisz kasakrisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending tests

Copy link

@deniskuzZ deniskuzZ merged commit b3038e7 into apache:master Jun 18, 2025
4 checks passed
@deniskuzZ deniskuzZ deleted the HIVE-28987 branch June 18, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants