-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Add "emitEmptyBuckets" parameter to the "Bucket" function. #131609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c150e9c
to
38b19d3
Compare
💚 CLA has been signed |
e6e6729
to
db7accd
Compare
6ef0feb
to
3cae3af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it looks quite good!
Mixed some high level comments with minor things. But I would forget later otherwise, so feel free to ignore if this will evolve!
I would like somebody else from the team check it, as they may have ideas on the critical topics here:
- Doing this in the coord node
- Finding the Bucket
- Maybe other validations and things to do
assert toArg.optional(); | ||
var emitEmptyBucketsArg = args.get(++i); | ||
assert emitEmptyBucketsArg.optional(); | ||
// TODO: Should it be possible to be able to specify "emitEmptyBuckets" but not "from" and "to"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a continuation? After the coordinator reduced all the datanode aggs, we have the min and max in theory.
Yet, I'm not sure if that would be clear to the users, or if there's any use case. Maybe worth a discussion
@@ -212,13 +221,20 @@ public Bucket( | |||
type = { "integer", "long", "double", "date", "keyword", "text" }, | |||
optional = true, | |||
description = "End of the range. Can be a number, a date or a date expressed as a string." | |||
) Expression to | |||
) Expression to, | |||
@Param( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be a MapParam
instead. Basically, a named parameter
@@ -410,9 +475,8 @@ protected TypeResolution resolveType() { | |||
|
|||
private TypeResolution checkArgsCount(int expectedCount) { | |||
String expected = null; | |||
if (expectedCount == 2 && (from != null || to != null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this case removed?
@Override | ||
public void addInput(Page page) { | ||
if (isInitialPage.compareAndSet(true, false) | ||
&& (aggregators.size() == 0 || AggregatorMode.INITIAL.equals(aggregators.get(0).getMode()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later: Would be nice if we could do this on output() on the coordinator, on FINAL aggs only.
I remember the reason to not do this was that Bucket wouldn't be on the coord physical plan, as it's moved to an early eval
// TODO: DocBlock doesn't allow appending nulls | ||
((DocBlock.Builder) blockBuilders[channel]).appendShard(0).appendSegment(0).appendDoc(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels dangerous. Adding a comment to not forget it
@@ -289,6 +310,42 @@ protected Page wrapPage(Page page) { | |||
return page; | |||
} | |||
|
|||
private Page createInitialPage(Page page) { | |||
// If no groups are generating bucket keys, move on | |||
if (groups.stream().allMatch(g -> g.emptyBucketGenerator() == null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert on the constructor (probably?) that either all have a generator, or none have. If there's a mix, I guess we'll have a problem
@Override | ||
public void generate(Block.Builder blockBuilder) { | ||
for (double bucket = round(Math.floor(from / roundTo) * roundTo, 2); bucket < to; bucket = round(bucket + roundTo, 2)) { | ||
((DoubleBlock.Builder) blockBuilder).appendDouble(bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this cast outside the for. Maybe an:
if (blockBuilder instanceof DoubleBlock.Builder doubleBlockBuilder) {
// Code...
} else {
throw ...;
}
Same for the other generator.
if (ne.id().equals(bucketId)) { | ||
if (ne.children().size() > 0 && ne.children().get(0) instanceof Bucket bucket) { | ||
foundBucket.set(bucket); | ||
// TODO: Hack used when BUCKET is wrapped with ROUND. How to generalize it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket technically can be used within any expression I think? Is this only for Round?
-- DRAFT --
This PR extends the BUCKET function by adding the ability to emit empty buckets (i.e. buckets with no data).
Currently, the BUCKET function can work in two modes and can have 2 or 4 arguments (see docs).
With this PR, the 5th parameter (emitEmptyBuckets, name to be finalized) is introduced. The emitEmptyBuckets parameter is of boolean type and has the following semantics:
false: empty buckets will not be emitted (this is the current behavior)
true: empty buckets will be emitted. In this case, from and to parameters are required, even when the desired bucket size is provided explicitly.
The default value for emitEmptyBuckets is false for BWC.
Relates #111483