Skip to content

Fuse multiple scalar-aggregate subqueries over the same source into a single scan#23214

Draft
nathanb9 wants to merge 1 commit into
apache:mainfrom
nathanb9:feat/fuse-scalar-aggregate-subqueries
Draft

Fuse multiple scalar-aggregate subqueries over the same source into a single scan#23214
nathanb9 wants to merge 1 commit into
apache:mainfrom
nathanb9:feat/fuse-scalar-aggregate-subqueries

Conversation

@nathanb9

@nathanb9 nathanb9 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

When a query computes several uncorrelated scalar-aggregate subqueries over the same source, DataFusion scans that source once per subquery. When the subqueries share an identical source, they can be computed in a single scan by pushing each predicate into a FILTER (WHERE ...) clause.

What changes are included in this PR?

A new logical optimizer rule, FuseScalarSubqueries, gated by datafusion.optimizer.enable_fuse_scalar_subqueries (default off). When a projection contains 2 or more uncorrelated scalar-aggregate subqueries over a structurally identical source, the rule fuses them into a single aggregate:

-- Before: two scans of t
SELECT (SELECT count(*) FROM t WHERE a < 10),
       (SELECT avg(x)   FROM t WHERE a >= 10);

-- After: one scan of t
SELECT count(*) FILTER (WHERE a < 10),
       avg(x)   FILTER (WHERE a >= 10)
FROM t;

The source filter becomes the OR of the branch predicates, and each scalar subquery is replaced by a reference to the merged aggregate column. The rule runs before subquery decorrelation and is conservative: it skips correlated, DISTINCT, ordered, or volatile aggregates, and predicates containing subqueries.

This is an initial, opt-in version mirroring the existing enable_unions_to_filter rule. Follow-ups could add a size or selectivity guard.

@github-actions github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jun 26, 2026
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-common v54.0.0 (current)
       Built [  35.344s] (current)
     Parsing datafusion-common v54.0.0 (current)
      Parsed [   0.061s] (current)
    Building datafusion-common v54.0.0 (baseline)
       Built [  33.330s] (baseline)
     Parsing datafusion-common v54.0.0 (baseline)
      Parsed [   0.060s] (baseline)
    Checking datafusion-common v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.659s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field OptimizerOptions.enable_fuse_scalar_subqueries in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:1347

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  70.552s] datafusion-common
    Building datafusion-optimizer v54.0.0 (current)
       Built [  27.413s] (current)
     Parsing datafusion-optimizer v54.0.0 (current)
      Parsed [   0.032s] (current)
    Building datafusion-optimizer v54.0.0 (baseline)
       Built [  26.890s] (baseline)
     Parsing datafusion-optimizer v54.0.0 (baseline)
      Parsed [   0.030s] (baseline)
    Checking datafusion-optimizer v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.161s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  55.274s] datafusion-optimizer
    Building datafusion-sqllogictest v54.0.0 (current)
       Built [ 177.812s] (current)
     Parsing datafusion-sqllogictest v54.0.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-sqllogictest v54.0.0 (baseline)
       Built [ 180.902s] (baseline)
     Parsing datafusion-sqllogictest v54.0.0 (baseline)
      Parsed [   0.022s] (baseline)
    Checking datafusion-sqllogictest v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.091s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 361.368s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fuse multiple scalar-aggregate subqueries over the same source into a single scan

1 participant