[SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult #53263

dwsmith1983 · 2025-11-29T06:26:17Z

What changes were proposed in this pull request?

This PR enables Dynamic Partition Pruning (DPP) optimization when joining with CommandResult nodes (e.g., results from SHOW PARTITIONS).

Changes made to sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala:

Modified hasSelectivePredicate() to recognize CommandResult as selective (line 212)
Modified calculatePlanOverhead() to return 0.0 overhead for CommandResult since data is already materialized (lines 187, 199)
Modified hasPartitionPruningFilter() to exclude plans containing CommandResult from being used as DPP filter sources (line 221)

Added test coverage in sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala to verify DPP works correctly with CommandResult.

Built and tested against tag v4.0.1 locally to verify the results and Spark plan as well.

https://issues.apache.org/jira/browse/SPARK-54554

Why are the changes needed?

Previously, when using SHOW PARTITIONS results in a broadcast join, Spark would perform full table scans instead of applying Dynamic Partition Pruning.

Example scenario where this matters:
val partitions = spark.sql("SHOW PARTITIONS fact_table")
.selectExpr("cast(split(partition, '=')[1] as int) as partition_id")
.agg(max("partition_id"))

spark.table("fact_table")
.join(partitions, col("partition_id") === col("max(partition_id)"))

Before this fix: Full table scan of all partitions
After this fix: DPP prunes to only the relevant partition(s)

Does this PR introduce any user-facing change?

Yes. Queries that join partitioned tables with SHOW PARTITIONS results (or other commands returning CommandResult) will now benefit from Dynamic Partition Pruning, potentially improving performance by scanning fewer partitions.

The behavior change is transparent to users - existing queries will simply run faster without any code changes required.

How was this patch tested?

Added new test case "DPP with CommandResult from SHOW PARTITIONS in broadcast join" in DynamicPartitionPruningSuite that verifies:
- DPP is applied when joining with CommandResult
- Correct query results are returned
- Plan contains DynamicPruningSubquery operator

Ran full DynamicPartitionPruning test suite (73 tests total) - all passed

Tested manually with local Spark build using various CommandResult scenarios

Was this patch authored or co-authored using generative AI tooling?

No

CommandResult (from commands like SHOW PARTITIONS) should qualify for Dynamic Partition Pruning optimization in broadcast joins. Previously, CommandResult was not recognized as a selective predicate, causing full table scans even when the partition list was small. This change: - Treats CommandResult as selective in hasSelectivePredicate() - Sets CommandResult overhead to 0.0 in calculatePlanOverhead() (data already materialized, no I/O cost) - Adds test coverage in DynamicPartitionPruningSuite - Removed unnecessary catch all - Updated test name to reflect Jira ticket - Corrected wrong df values in DynamicPartitionPruningSuite - Updated Jira ticket in test name - Removed test script Co-authored-by: Tri Tam Hoang <[email protected]>

github-actions bot added the SQL label Nov 29, 2025

dwsmith1983 changed the title ~~Spark 54554 dpp command result~~ [SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult Nov 29, 2025

dwsmith1983 force-pushed the SPARK-54554-dpp-command-result branch 3 times, most recently from e3d2c69 to f7ac51c Compare November 29, 2025 09:50

dwsmith1983 marked this pull request as draft November 29, 2025 13:44

dwsmith1983 force-pushed the SPARK-54554-dpp-command-result branch from d8857de to f8bd1e7 Compare November 29, 2025 16:23

dwsmith1983 marked this pull request as ready for review November 29, 2025 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult #53263

[SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult #53263

dwsmith1983 commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult #53263

Are you sure you want to change the base?

[SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult #53263

Conversation

dwsmith1983 commented Nov 29, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant