Add EmitTo::FirstBlock for grouped aggregate emission#23274
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
8b472b2 to
3df1619
Compare
EmitTo::FirstBlock for grouped aggregate emission
|
I think this design should be implemented as the first PR for #7065 🤔 . It does not seem possible to introduce an intermediate step like this PR that only splits out the I agree that the initial blocked-state PR may become large, so we will need to figure out how to arrange and split the work better. For coordination, the initial blocked-state PR would depend on #22710 being finished first. |
Thanks for the guidance. That makes sense to me. I was trying to split the work into the smallest possible first step, but I agree that if I will pause this PR for now and follow the blocked-state / #7065 direction instead. I can keep the current API experiment and benchmark results as reference material, and revisit the API shape when the initial blocked-state implementation is ready after the #22710 stream-splitting work. Thanks again for helping clarify the right sequencing. |
|
@2010YOUY01 do you think, it's stable enough to push forward #15591 again? |
|
@hhhizzz I reviewed the sorted path, I think |
Thanks for checking the sorted path. I’ll keep this PR as an experiment/reference, and focus the follow-up investigation on the blocked-state direction for hash aggregation, while treating sorted aggregation as a possibly separate design with sorted-specific |
For merging, I don't think it's ready yet. We should wait until the refactoring EPIC is closed first. The goal is to ship the whole refactor soon to reduce disruption; otherwise, we may run into even more development conflicts. But I think we can still review and discuss the plan for now, since that part shouldn’t depend on the refactor. We may be able to agree on the blocked-state implementation design and plan sooner, so once the refactor is complete, the remaining work would be to resolve small conflicts and merge soon after. |
An alternative is also remove Conceptually, If we set it to 100, and at some point there are 280 groups accumulated so far, the internal layout is: And
|
Which issue does this PR close?
I agree the full implementation path can be quite large. I narrowed this PR down to the simplest first step: adding the API variant and making existing implementations handle it safely with the current behavior. The block-aware implementation and performance changes can follow in separate PRs after the API direction is agreed on.
Rationale for this change
This PR narrows the scope to the smallest first step for the longer-term aggregate output work: adding an explicit API variant for block-bounded grouped aggregate emission.
The larger implementation and performance work for #23249 can be discussed and reviewed separately. This PR only introduces the API shape so reviewers can first agree on the direction.
What changes are included in this PR?
EmitTo::FirstBlock(usize).EmitTo::First(usize).EmitTomatches so the new variant is handled safely.First(n)semantics.FirstBlock(n)toFFI_EmitTo::First(n).This PR does not yet make hash aggregate output use
FirstBlock. That will be follow-up work.Are these changes tested?
I ran the targeted checks for the affected crates:
I also fixed one nondeterministic sqllogictest expectation found by CI.
Are there any user-facing changes?
Yes. This intentionally adds a new public
EmitTo::FirstBlockenum variant, so this PR is an API change.