Skip to content

Conversation

@jonathanc-n
Copy link
Contributor

…nExecfromCoalesceBatches` Optimizer Rule

Which issue does this PR close?

What changes are included in this PR?

  • Uses LimitedBatchCoalescer in HashJoinExec
  • Removes Hash join exec from Coalesce optimizer rule.

Are these changes tested?

Existing tests

…nExec` from `CoalesceBatches` Optimizer Rule
@github-actions github-actions bot added optimizer Optimizer rules physical-plan Changes to the physical-plan crate labels Nov 21, 2025

/// Batch coalescer for coalescing batches from the probe side
batch_coalescer: LimitedBatchCoalescer,
/// Flag to track if we've handled empty output to avoid returning empty schema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #18859

Copy link
Contributor Author

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18295 should be a nice follow up to this.

@jonathanc-n
Copy link
Contributor Author

Anybody know how to quickly remove all the CoalesceBatches line from hash join explain plans? 😆

@martin-g
Copy link
Member

Anybody know how to quickly remove all the CoalesceBatches line from hash join explain plans? 😆

See https://docs.rs/insta/latest/insta/#updating-snapshots

Self {
partition,
schema,
schema: Arc::clone(&schema),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to clone here ? Usually this happens at the call site

)?
};

self.batch_coalescer.push_batch(result)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that ignoring PushBatchStatus::LimitReached result is intentional. Same at line 762 below.


self.state = HashJoinStreamState::FetchProbeBatch;

return Ok(StatefulStreamResult::Ready(Some(result)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also need to be improved ?
Currently it returns the result without pushing it to the coalescer.
I think it should call self.batch_coalescer.push_batch(result)?; and then check for a completed batch:

if let Some(batch) = self.batch_coalescer.next_completed_batch() {
  return Ok(StatefulStreamResult::Ready(Some(batch)));
}

return Ok(StatefulStreamResult::Continue);

@Dandandan
Copy link
Contributor

Anybody know how to quickly remove all the CoalesceBatches line from hash join explain plans? 😆

See https://docs.rs/insta/latest/insta/#updating-snapshots

Also for sqllogictests:

INCLUDE_TPCH=true  cargo test --test sqllogictests -- --complete

@Dandandan Dandandan closed this Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate BatchCoalescer into HashJoinExec and remove from CoalesceBatches optimization rule

3 participants