Skip to content

Add "emitEmptyBuckets" parameter to the "Bucket" function. #131609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Jul 21, 2025

-- DRAFT --

This PR extends the BUCKET function by adding the ability to emit empty buckets (i.e. buckets with no data).

Currently, the BUCKET function can work in two modes and can have 2 or 4 arguments (see docs).
With this PR, the 5th parameter (emitEmptyBuckets, name to be finalized) is introduced. The emitEmptyBuckets parameter is of boolean type and has the following semantics:

false: empty buckets will not be emitted (this is the current behavior)
true: empty buckets will be emitted. In this case, from and to parameters are required, even when the desired bucket size is provided explicitly.
The default value for emitEmptyBuckets is false for BWC.

Relates #111483

@przemekwitek przemekwitek force-pushed the emit_empty_buckets branch 5 times, most recently from c150e9c to 38b19d3 Compare July 21, 2025 14:58
Copy link

cla-checker-service bot commented Jul 21, 2025

💚 CLA has been signed

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, it looks quite good!
Mixed some high level comments with minor things. But I would forget later otherwise, so feel free to ignore if this will evolve!

I would like somebody else from the team check it, as they may have ideas on the critical topics here:

  • Doing this in the coord node
  • Finding the Bucket
  • Maybe other validations and things to do

assert toArg.optional();
var emitEmptyBucketsArg = args.get(++i);
assert emitEmptyBucketsArg.optional();
// TODO: Should it be possible to be able to specify "emitEmptyBuckets" but not "from" and "to"?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a continuation? After the coordinator reduced all the datanode aggs, we have the min and max in theory.
Yet, I'm not sure if that would be clear to the users, or if there's any use case. Maybe worth a discussion

@@ -212,13 +221,20 @@ public Bucket(
type = { "integer", "long", "double", "date", "keyword", "text" },
optional = true,
description = "End of the range. Can be a number, a date or a date expressed as a string."
) Expression to
) Expression to,
@Param(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be a MapParam instead. Basically, a named parameter

@@ -410,9 +475,8 @@ protected TypeResolution resolveType() {

private TypeResolution checkArgsCount(int expectedCount) {
String expected = null;
if (expectedCount == 2 && (from != null || to != null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this case removed?

@Override
public void addInput(Page page) {
if (isInitialPage.compareAndSet(true, false)
&& (aggregators.size() == 0 || AggregatorMode.INITIAL.equals(aggregators.get(0).getMode()))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: Would be nice if we could do this on output() on the coordinator, on FINAL aggs only.
I remember the reason to not do this was that Bucket wouldn't be on the coord physical plan, as it's moved to an early eval

Comment on lines +333 to +334
// TODO: DocBlock doesn't allow appending nulls
((DocBlock.Builder) blockBuilders[channel]).appendShard(0).appendSegment(0).appendDoc(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels dangerous. Adding a comment to not forget it

@@ -289,6 +310,42 @@ protected Page wrapPage(Page page) {
return page;
}

private Page createInitialPage(Page page) {
// If no groups are generating bucket keys, move on
if (groups.stream().allMatch(g -> g.emptyBucketGenerator() == null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an assert on the constructor (probably?) that either all have a generator, or none have. If there's a mix, I guess we'll have a problem

@Override
public void generate(Block.Builder blockBuilder) {
for (double bucket = round(Math.floor(from / roundTo) * roundTo, 2); bucket < to; bucket = round(bucket + roundTo, 2)) {
((DoubleBlock.Builder) blockBuilder).appendDouble(bucket);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this cast outside the for. Maybe an:

if (blockBuilder instanceof DoubleBlock.Builder doubleBlockBuilder) {
    // Code...
} else {
    throw ...;
}

Same for the other generator.

if (ne.id().equals(bucketId)) {
if (ne.children().size() > 0 && ne.children().get(0) instanceof Bucket bucket) {
foundBucket.set(bucket);
// TODO: Hack used when BUCKET is wrapped with ROUND. How to generalize it?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket technically can be used within any expression I think? Is this only for Round?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants