Spark 4.1: Implement SupportsReportOrdering DSv2 API#14948
Spark 4.1: Implement SupportsReportOrdering DSv2 API#14948anuragmantri wants to merge 1 commit intoapache:mainfrom
Conversation
cc08ff2 to
b4fde94
Compare
|
Moved the changes to Spark 4.1 since it is now the latest version. Marked this PR as WIP as there is a prerequisite PR #14683 that is also in review. |
|
My concern from Spark PoV is that unnecessary partition grouping can cause performance degradations. SPARK-55092 is a ticket about the problem and apache/spark#53859 / apache/spark#54330 PRs try to fix the problem. If this PR disables bin packing then the above PRs won't be able to fix the issue.
So I would suggest keeping bin packing and reporting sort order for those packed partitions (i.e. the partitions might not be unique by key, but they are locally sorted), and when partition grouping is actually needed then Spark should merge the sorted partitions with the same key using k-way merge. |
|
As we discussed offline, a long term (after apache/spark#54330) solution could be to improve the new |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This PR is not stale. We are waiting waiting for the pre-requisite PR to be merged. I will update this PR after that one is merged. |
b4fde94 to
dbf3ea9
Compare
|
I rebased the PR after #15150. This is ready for review now. @RussellSpitzer @aokolnychyi @szehon-ho - Could you take a look please? |
Thanks @peter-toth , this makes sense. I think this PR is still needed and would still be valuable for tables with decently sized partitions. Since it's gated by a flag, I think it's safe to implement. |
Absolutely. |
This PR implements the Spark DSv2 SupportsReportOrdering API to enable Spark's sort elimination optimization for partitioned tables when reading sorted Iceberg tables that have a defined sort order and files are written respecting that order.
Sort order reporting can be enabled by using this new flag:
SET spark.sql.iceberg.planning.preserve-data-ordering = true;(defaultfalse)Implementation summary:
Ordering Validation:
SortOrderAnalyzervalidates two conditions beforeSparkPartitioningAwareScan.outputOrdering()reports ordering to SparkIf either condition fails, no ordering is reported.
Merging Sorted Files: Since sorted files within a partition may have overlapping ranges, MergingSortedRowDataReader merges rows from multiple sorted files using k-way merge with a min-heap. The read schema is augmented with any sort key columns absent from Spark's projection so that the comparator can always access sort key fields, even when they are pruned by Spark's column-pruning optimizer.
Row Comparison: Uses the
SortOrderComparators.forSchema()Iceberg API withInternalRowWrapperto bridge SparkInternalRowto IcebergStructLike. This correctly handles all transform types (identity, bucket, truncate), SC/DESC directions, and null ordering.Constraints
preserve-data-orderingis enabled, bin-packing of large partitions is disabled. All files within a partition are placed into a single Spark task. This is a known limitation of the current KeyGroupedPartitioning based approach and is expected to be addressed in a future improvement SPARK-56241.Sort elimination examples
Without reporting sort order
With sort order reporting:
Without reporting sort order
With sort order reporting:
AI Usage: I used Claude Sonnet 4.6 for code generation and writing tests. I manually reviewed the generated code.