Skip to content

Conversation

@cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Jan 9, 2026

Which issue does this PR close?

Closes #1792

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Add UT

@cxzl25 cxzl25 added correctness a query provides different output from spark and removed spark native labels Jan 9, 2026
@cxzl25 cxzl25 requested a review from Copilot January 9, 2026 16:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for broadcasting the isNullAwareAntiJoin flag from Spark to the native execution engine to correctly handle NULL values in anti-joins. In standard LEFT ANTI JOINs, rows with NULL keys should be included in the result (since NULL never matches anything), while in null-aware anti-joins (e.g., NOT IN queries), rows with NULL keys should be excluded.

Key Changes:

  • Added isNullAwareAntiJoin boolean parameter throughout the broadcast join code path
  • Implemented version-specific handling for Spark 3.0 (which lacks this field) vs Spark 3.1+
  • Updated Rust join logic to conditionally filter NULL keys based on the flag value
  • Unified test expectations across all join types to ensure consistent NULL handling

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
native-engine/auron-serde/proto/auron.proto Added is_null_aware_anti_join boolean field to BroadcastJoinExecNode message
native-engine/auron-serde/src/from_proto.rs Updated protobuf deserialization to read and pass the new field
native-engine/datafusion-ext-plans/src/broadcast_join_exec.rs Added field to struct, constructor, and cloning logic
native-engine/datafusion-ext-plans/src/joins/mod.rs Added field to JoinParams struct to pass flag through join pipeline
native-engine/datafusion-ext-plans/src/joins/bhj/semi_join.rs Implemented logic to exclude NULL keys for null-aware anti-joins
native-engine/datafusion-ext-plans/src/joins/test.rs Updated test to verify consistent NULL handling across all join types
native-engine/datafusion-ext-plans/src/sort_merge_join_exec.rs Initialized field to false for sort-merge joins
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Added version-specific methods to extract flag from BroadcastHashJoinExec
spark-extension/src/main/scala/org/apache/spark/sql/auron/Shims.scala Updated interface to include new parameter
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala Added parameter to base class constructor
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/ShimsImpl.scala Updated implementation to pass through the flag
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeBroadcastJoinExec.scala Added parameter to case class
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala Added two tests verifying standard and null-aware anti-join behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
}

test("left join with NOT IN subquery should filter NULL values") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  == Results ==
  !== Correct Answer - 1 ==   == Spark Answer - 1 ==
   struct<cnt:bigint>         struct<cnt:bigint>
  ![9]                        [99999] (QueryTest.scala:243)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

correctness a query provides different output from spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Left connection failure

1 participant