Skip to content

Conversation

@dwsmith1983
Copy link
Contributor

What changes were proposed in this pull request?

This PR enables Dynamic Partition Pruning (DPP) optimization when joining with CommandResult nodes (e.g., results from SHOW PARTITIONS).

Changes made to sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala:

  1. Modified hasSelectivePredicate() to recognize CommandResult as selective (line 212)
  2. Modified calculatePlanOverhead() to return 0.0 overhead for CommandResult since data is already materialized (lines 187, 199)
  3. Modified hasPartitionPruningFilter() to exclude plans containing CommandResult from being used as DPP filter sources (line 221)

Added test coverage in sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala to verify DPP works correctly with CommandResult.

Built and tested against tag v4.0.1 locally to verify the results and Spark plan as well.

https://issues.apache.org/jira/browse/SPARK-54554

Why are the changes needed?

Previously, when using SHOW PARTITIONS results in a broadcast join, Spark would perform full table scans instead of applying Dynamic Partition Pruning.

Example scenario where this matters:
val partitions = spark.sql("SHOW PARTITIONS fact_table")
.selectExpr("cast(split(partition, '=')[1] as int) as partition_id")
.agg(max("partition_id"))

spark.table("fact_table")
.join(partitions, col("partition_id") === col("max(partition_id)"))

Before this fix: Full table scan of all partitions
After this fix: DPP prunes to only the relevant partition(s)

Does this PR introduce any user-facing change?

Yes. Queries that join partitioned tables with SHOW PARTITIONS results (or other commands returning CommandResult) will now benefit from Dynamic Partition Pruning, potentially improving performance by scanning fewer partitions.

The behavior change is transparent to users - existing queries will simply run faster without any code changes required.

How was this patch tested?

Added new test case "DPP with CommandResult from SHOW PARTITIONS in broadcast join" in DynamicPartitionPruningSuite that verifies:
- DPP is applied when joining with CommandResult
- Correct query results are returned
- Plan contains DynamicPruningSubquery operator

Ran full DynamicPartitionPruning test suite (73 tests total) - all passed

Tested manually with local Spark build using various CommandResult scenarios

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 29, 2025
@dwsmith1983 dwsmith1983 changed the title Spark 54554 dpp command result [SPARK-54554][SQL] Enable Dynamic Partition Pruning with CommandResult Nov 29, 2025
@dwsmith1983 dwsmith1983 force-pushed the SPARK-54554-dpp-command-result branch 3 times, most recently from e3d2c69 to f7ac51c Compare November 29, 2025 09:50
@dwsmith1983 dwsmith1983 marked this pull request as draft November 29, 2025 13:44
CommandResult (from commands like SHOW PARTITIONS) should qualify for
Dynamic Partition Pruning optimization in broadcast joins. Previously,
CommandResult was not recognized as a selective predicate, causing
full table scans even when the partition list was small.

This change:
- Treats CommandResult as selective in hasSelectivePredicate()
- Sets CommandResult overhead to 0.0 in calculatePlanOverhead()
  (data already materialized, no I/O cost)
- Adds test coverage in DynamicPartitionPruningSuite
- Removed unnecessary catch all
- Updated test name to reflect Jira ticket
- Corrected wrong df values in DynamicPartitionPruningSuite
- Updated Jira ticket in test name
- Removed test script

Co-authored-by: Tri Tam Hoang <[email protected]>
@dwsmith1983 dwsmith1983 force-pushed the SPARK-54554-dpp-command-result branch from d8857de to f8bd1e7 Compare November 29, 2025 16:23
@dwsmith1983 dwsmith1983 marked this pull request as ready for review November 29, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant