-
Notifications
You must be signed in to change notification settings - Fork 201
[AURON #1792][FOLLOWUP] Broadcast isNullAwareAntiJoin flag #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for broadcasting the isNullAwareAntiJoin flag from Spark to the native execution engine to correctly handle NULL values in anti-joins. In standard LEFT ANTI JOINs, rows with NULL keys should be included in the result (since NULL never matches anything), while in null-aware anti-joins (e.g., NOT IN queries), rows with NULL keys should be excluded.
Key Changes:
- Added
isNullAwareAntiJoinboolean parameter throughout the broadcast join code path - Implemented version-specific handling for Spark 3.0 (which lacks this field) vs Spark 3.1+
- Updated Rust join logic to conditionally filter NULL keys based on the flag value
- Unified test expectations across all join types to ensure consistent NULL handling
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| native-engine/auron-serde/proto/auron.proto | Added is_null_aware_anti_join boolean field to BroadcastJoinExecNode message |
| native-engine/auron-serde/src/from_proto.rs | Updated protobuf deserialization to read and pass the new field |
| native-engine/datafusion-ext-plans/src/broadcast_join_exec.rs | Added field to struct, constructor, and cloning logic |
| native-engine/datafusion-ext-plans/src/joins/mod.rs | Added field to JoinParams struct to pass flag through join pipeline |
| native-engine/datafusion-ext-plans/src/joins/bhj/semi_join.rs | Implemented logic to exclude NULL keys for null-aware anti-joins |
| native-engine/datafusion-ext-plans/src/joins/test.rs | Updated test to verify consistent NULL handling across all join types |
| native-engine/datafusion-ext-plans/src/sort_merge_join_exec.rs | Initialized field to false for sort-merge joins |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala | Added version-specific methods to extract flag from BroadcastHashJoinExec |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/Shims.scala | Updated interface to include new parameter |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastJoinBase.scala | Added parameter to base class constructor |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/ShimsImpl.scala | Updated implementation to pass through the flag |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeBroadcastJoinExec.scala | Added parameter to case class |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala | Added two tests verifying standard and null-aware anti-join behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| test("left join with NOT IN subquery should filter NULL values") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
struct<cnt:bigint> struct<cnt:bigint>
![9] [99999] (QueryTest.scala:243)
Which issue does this PR close?
Closes #1792
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
How was this patch tested?
Add UT