Skip to content

Avoid repeated EmitTo::First in partial hash aggregate output#23250

Merged
Rachelint merged 6 commits into
apache:mainfrom
hhhizzz:codex/emit-first-partial-investigation
Jun 30, 2026
Merged

Avoid repeated EmitTo::First in partial hash aggregate output#23250
Rachelint merged 6 commits into
apache:mainfrom
hhhizzz:codex/emit-first-partial-investigation

Conversation

@hhhizzz

@hhhizzz hhhizzz commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The migrated partial hash aggregate output path still used EmitTo::First(batch_size) when draining grouped aggregate state in batches.

For terminal output this is unnecessary and can be expensive: EmitTo::First is not just slicing the first N rows, it also shifts remaining group indexes and maintains GroupValues lookup state. For high-cardinality partial aggregate output, this can cause repeated work during output draining.

The final hash aggregate path already avoids this by materializing output once with EmitTo::All and then slicing the resulting RecordBatch. This PR applies the same approach to partial hash aggregate output.

What changes are included in this PR?

  • Remove the helper that selected EmitTo::First(batch_size) for hash aggregate terminal output.
  • Change migrated partial hash aggregate output to:
    • materialize grouped keys and aggregate state once with EmitTo::All
    • slice the materialized RecordBatch into batch_size chunks across output polls
  • Rename the shared materialized-output state/type to mode-neutral names because it is now used by both final and partial output paths.
  • Add a regression test with a custom GroupsAccumulator that fails if partial terminal output calls EmitTo::First(_).
  • Strengthen the regression test to verify both batch slicing and emitted key/state values.

Are these changes tested?

Yes.

Local targeted tests:

cargo test -p datafusion-physical-plan partial_grouped_aggregate_materializes_before_slicing -- --nocapture
cargo test -p datafusion-physical-plan materialized_aggregate_output_slices_batches_until_exhausted -- --nocapture
git diff --check

Additional local verification run during development:

cargo test -p datafusion-physical-plan materialized_final_output_slices_batches_until_exhausted -- --nocapture
cargo test -p datafusion-physical-plan partial_grouped_aggregate_uses_raw_partial_stream -- --nocapture

The new regression test was also applied to the pre-fix baseline and failed with the expected internal error when the partial output path used EmitTo::First.
Local benchmark evidence was collected against the implementation commit before the final test/naming polish commit.
ClickBench full 43-query run, 5 iterations, 24 cores skip partial aggregation probe ratio 0.8:

mode total warm time geomean warm time
baseline migrated aggregate 128509.47 ms 352.79 ms
patched migrated aggregate 19652.37 ms 180.65 ms
baseline old aggregate path 19774.70 ms 181.25 ms

Largest patched/current wins included:
q33: 32961.02ms -> 1642.08ms
q34: 32739.34ms -> 1635.07ms
q18: 25673.25ms -> 1767.25ms
q16: 5949.82ms -> 810.17ms
q17: 5906.51ms -> 807.10ms
TPC-DS SF10 full99, 10 rounds:
Failures: 0
Aggregate geomean current/main: 0.982817
Aggregate current speedup: 1.748%

Are there any user-facing changes?

No. This is an internal physical execution change for hash aggregate output draining. There are no public API or documented behavior changes.

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 30, 2026
@Dandandan

Copy link
Copy Markdown
Contributor

run benchmarks

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4840701536-749-5nw6v 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/emit-first-partial-investigation (eedbae7) to 01bf68c (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4840701536-750-7vv4d 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/emit-first-partial-investigation (eedbae7) to 01bf68c (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4840701536-751-wp2nd 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/emit-first-partial-investigation (eedbae7) to 01bf68c (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_emit-first-partial-investigation
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ codex_emit-first-partial-investigation ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.22 / 39.52 ±1.33 / 41.28 ms │         38.11 / 39.54 ±1.31 / 41.32 ms │     no change │
│ QQuery 2  │ 19.34 / 19.63 ±0.20 / 19.88 ms │         19.20 / 19.94 ±0.77 / 21.34 ms │     no change │
│ QQuery 3  │ 31.18 / 33.86 ±1.64 / 36.34 ms │         31.38 / 33.08 ±1.09 / 34.33 ms │     no change │
│ QQuery 4  │ 17.58 / 17.72 ±0.12 / 17.88 ms │         17.72 / 18.04 ±0.50 / 19.02 ms │     no change │
│ QQuery 5  │ 38.11 / 41.19 ±2.59 / 45.99 ms │         38.93 / 41.92 ±2.07 / 44.71 ms │     no change │
│ QQuery 6  │ 16.18 / 16.34 ±0.15 / 16.55 ms │         16.27 / 16.97 ±0.82 / 18.42 ms │     no change │
│ QQuery 7  │ 44.46 / 46.00 ±1.31 / 48.26 ms │         44.14 / 46.30 ±2.37 / 50.67 ms │     no change │
│ QQuery 8  │ 43.38 / 43.66 ±0.15 / 43.80 ms │         43.35 / 43.60 ±0.29 / 44.13 ms │     no change │
│ QQuery 9  │ 49.98 / 51.06 ±0.91 / 52.60 ms │         49.59 / 50.69 ±0.68 / 51.62 ms │     no change │
│ QQuery 10 │ 42.39 / 42.73 ±0.30 / 43.20 ms │         42.30 / 42.66 ±0.35 / 43.26 ms │     no change │
│ QQuery 11 │ 13.62 / 13.79 ±0.11 / 13.92 ms │         13.52 / 13.61 ±0.08 / 13.73 ms │     no change │
│ QQuery 12 │ 23.72 / 23.92 ±0.22 / 24.33 ms │         23.88 / 24.38 ±0.43 / 25.04 ms │     no change │
│ QQuery 13 │ 31.94 / 35.01 ±2.67 / 39.08 ms │         32.70 / 34.44 ±1.00 / 35.77 ms │     no change │
│ QQuery 14 │ 23.90 / 24.75 ±1.24 / 27.16 ms │         23.88 / 24.18 ±0.25 / 24.54 ms │     no change │
│ QQuery 15 │ 31.55 / 32.06 ±0.49 / 32.79 ms │         31.52 / 31.86 ±0.34 / 32.37 ms │     no change │
│ QQuery 16 │ 14.47 / 14.67 ±0.17 / 14.96 ms │         13.93 / 14.15 ±0.13 / 14.33 ms │     no change │
│ QQuery 17 │ 87.62 / 88.79 ±1.41 / 91.37 ms │         73.78 / 74.81 ±0.56 / 75.27 ms │ +1.19x faster │
│ QQuery 18 │ 65.72 / 68.67 ±2.55 / 73.29 ms │         59.58 / 61.46 ±1.97 / 65.25 ms │ +1.12x faster │
│ QQuery 19 │ 33.00 / 33.28 ±0.41 / 34.10 ms │         33.27 / 34.01 ±1.07 / 36.12 ms │     no change │
│ QQuery 20 │ 34.63 / 34.98 ±0.28 / 35.33 ms │         32.18 / 32.65 ±0.36 / 33.01 ms │ +1.07x faster │
│ QQuery 21 │ 55.89 / 57.75 ±1.31 / 59.45 ms │         55.72 / 57.32 ±0.89 / 58.10 ms │     no change │
│ QQuery 22 │ 14.05 / 14.46 ±0.44 / 15.30 ms │         13.94 / 14.12 ±0.17 / 14.34 ms │     no change │
└───────────┴────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 793.84ms │
│ Total Time (codex_emit-first-partial-investigation)   │ 769.73ms │
│ Average Time (HEAD)                                   │  36.08ms │
│ Average Time (codex_emit-first-partial-investigation) │  34.99ms │
│ Queries Faster                                        │        3 │
│ Queries Slower                                        │        0 │
│ Queries with No Change                                │       19 │
│ Queries with Failure                                  │        0 │
└───────────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 1.2 GiB
Avg memory 507.9 MiB
CPU user 23.4s
CPU sys 1.7s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 1.2 GiB
Avg memory 520.7 MiB
CPU user 22.4s
CPU sys 1.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_emit-first-partial-investigation
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ codex_emit-first-partial-investigation ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.32 / 6.79 ±0.78 / 8.33 ms │            5.59 / 6.12 ±0.84 / 7.79 ms │ +1.11x faster │
│ QQuery 2  │        83.83 / 84.38 ±0.35 / 84.79 ms │         80.78 / 81.31 ±0.55 / 82.34 ms │     no change │
│ QQuery 3  │        31.15 / 31.30 ±0.16 / 31.57 ms │         29.37 / 29.71 ±0.24 / 29.94 ms │ +1.05x faster │
│ QQuery 4  │    487.16 / 513.89 ±17.29 / 533.57 ms │      490.95 / 494.04 ±2.48 / 497.77 ms │     no change │
│ QQuery 5  │        52.69 / 52.94 ±0.20 / 53.13 ms │         51.73 / 52.16 ±0.30 / 52.45 ms │     no change │
│ QQuery 6  │        37.01 / 37.33 ±0.26 / 37.67 ms │         36.85 / 37.49 ±0.49 / 38.15 ms │     no change │
│ QQuery 7  │        93.85 / 94.86 ±0.70 / 95.60 ms │         95.32 / 97.30 ±1.62 / 99.08 ms │     no change │
│ QQuery 8  │        36.98 / 37.54 ±0.73 / 38.95 ms │         38.64 / 39.69 ±1.48 / 42.45 ms │  1.06x slower │
│ QQuery 9  │        53.13 / 55.87 ±2.04 / 58.57 ms │         54.30 / 57.54 ±2.48 / 60.60 ms │     no change │
│ QQuery 10 │        63.76 / 64.40 ±0.40 / 64.91 ms │         66.42 / 66.82 ±0.36 / 67.43 ms │     no change │
│ QQuery 11 │     292.20 / 297.71 ±3.97 / 303.81 ms │      348.79 / 366.06 ±9.00 / 373.35 ms │  1.23x slower │
│ QQuery 12 │        28.74 / 28.84 ±0.15 / 29.14 ms │         29.53 / 30.05 ±0.40 / 30.62 ms │     no change │
│ QQuery 13 │     118.55 / 118.88 ±0.28 / 119.37 ms │      121.05 / 122.32 ±1.23 / 124.65 ms │     no change │
│ QQuery 14 │     417.89 / 419.44 ±1.45 / 422.07 ms │      417.18 / 427.36 ±7.09 / 437.97 ms │     no change │
│ QQuery 15 │        57.41 / 58.02 ±0.36 / 58.53 ms │         58.09 / 59.21 ±0.81 / 60.51 ms │     no change │
│ QQuery 16 │           6.92 / 7.07 ±0.24 / 7.54 ms │            6.87 / 7.00 ±0.19 / 7.37 ms │     no change │
│ QQuery 17 │        80.43 / 80.80 ±0.25 / 81.15 ms │         80.51 / 82.67 ±2.16 / 86.01 ms │     no change │
│ QQuery 18 │     123.69 / 125.94 ±1.79 / 128.80 ms │      124.37 / 125.72 ±0.82 / 126.52 ms │     no change │
│ QQuery 19 │        41.56 / 42.07 ±0.33 / 42.59 ms │         41.75 / 42.84 ±1.17 / 44.98 ms │     no change │
│ QQuery 20 │        35.25 / 36.78 ±1.71 / 39.96 ms │         36.20 / 36.68 ±0.27 / 37.00 ms │     no change │
│ QQuery 21 │        17.62 / 18.00 ±0.30 / 18.31 ms │         17.92 / 18.06 ±0.13 / 18.27 ms │     no change │
│ QQuery 22 │        62.48 / 63.15 ±0.66 / 63.96 ms │         62.37 / 63.35 ±0.94 / 65.01 ms │     no change │
│ QQuery 23 │     358.62 / 361.43 ±2.10 / 364.43 ms │      346.92 / 350.12 ±2.60 / 353.44 ms │     no change │
│ QQuery 24 │     226.94 / 229.80 ±3.69 / 236.87 ms │      226.48 / 228.76 ±2.18 / 232.52 ms │     no change │
│ QQuery 25 │     110.63 / 111.27 ±0.38 / 111.74 ms │      110.54 / 112.69 ±2.29 / 116.64 ms │     no change │
│ QQuery 26 │        58.42 / 60.98 ±4.16 / 69.26 ms │         58.34 / 58.54 ±0.14 / 58.71 ms │     no change │
│ QQuery 27 │           6.40 / 6.56 ±0.16 / 6.86 ms │            6.48 / 6.58 ±0.12 / 6.82 ms │     no change │
│ QQuery 28 │        61.14 / 61.85 ±0.60 / 62.92 ms │         60.70 / 61.18 ±0.27 / 61.47 ms │     no change │
│ QQuery 29 │       97.95 / 99.57 ±1.68 / 102.69 ms │        97.24 / 98.78 ±1.63 / 101.81 ms │     no change │
│ QQuery 30 │        35.02 / 35.93 ±0.79 / 37.26 ms │         32.42 / 32.76 ±0.35 / 33.41 ms │ +1.10x faster │
│ QQuery 31 │     119.12 / 120.09 ±0.89 / 121.56 ms │      111.57 / 113.33 ±2.86 / 119.02 ms │ +1.06x faster │
│ QQuery 32 │        22.69 / 23.09 ±0.23 / 23.35 ms │         20.41 / 21.22 ±1.15 / 23.50 ms │ +1.09x faster │
│ QQuery 33 │        41.03 / 44.24 ±4.28 / 52.54 ms │         38.01 / 38.53 ±0.75 / 40.02 ms │ +1.15x faster │
│ QQuery 34 │        10.95 / 11.29 ±0.25 / 11.57 ms │         10.04 / 10.13 ±0.15 / 10.43 ms │ +1.11x faster │
│ QQuery 35 │        83.01 / 83.84 ±0.64 / 84.80 ms │         72.58 / 73.18 ±0.48 / 73.91 ms │ +1.15x faster │
│ QQuery 36 │           6.81 / 6.86 ±0.03 / 6.89 ms │            5.83 / 5.99 ±0.21 / 6.41 ms │ +1.15x faster │
│ QQuery 37 │           7.99 / 8.05 ±0.06 / 8.17 ms │            6.93 / 7.04 ±0.07 / 7.12 ms │ +1.14x faster │
│ QQuery 38 │        71.93 / 74.78 ±4.65 / 84.05 ms │         62.36 / 63.68 ±0.97 / 65.29 ms │ +1.17x faster │
│ QQuery 39 │     102.03 / 103.69 ±1.64 / 105.92 ms │         86.45 / 86.71 ±0.32 / 87.31 ms │ +1.20x faster │
│ QQuery 40 │        25.12 / 25.54 ±0.36 / 26.11 ms │         23.50 / 23.68 ±0.14 / 23.83 ms │ +1.08x faster │
│ QQuery 41 │        12.54 / 12.60 ±0.06 / 12.69 ms │         11.55 / 11.72 ±0.17 / 12.04 ms │ +1.07x faster │
│ QQuery 42 │        25.14 / 25.46 ±0.29 / 25.98 ms │         23.60 / 23.85 ±0.15 / 24.07 ms │ +1.07x faster │
│ QQuery 43 │           5.39 / 5.45 ±0.04 / 5.48 ms │            5.02 / 5.13 ±0.11 / 5.35 ms │ +1.06x faster │
│ QQuery 44 │        10.10 / 10.18 ±0.05 / 10.25 ms │            9.46 / 9.56 ±0.08 / 9.67 ms │ +1.06x faster │
│ QQuery 45 │        42.76 / 45.45 ±3.65 / 52.58 ms │         38.85 / 41.11 ±2.30 / 45.40 ms │ +1.11x faster │
│ QQuery 46 │        12.92 / 13.59 ±0.56 / 14.35 ms │         12.06 / 12.57 ±0.39 / 13.12 ms │ +1.08x faster │
│ QQuery 47 │    233.87 / 253.10 ±10.46 / 261.82 ms │      224.80 / 228.88 ±4.37 / 234.91 ms │ +1.11x faster │
│ QQuery 48 │        96.11 / 96.81 ±0.60 / 97.74 ms │         96.08 / 96.72 ±0.61 / 97.74 ms │     no change │
│ QQuery 49 │        76.90 / 79.17 ±3.32 / 85.77 ms │         78.63 / 80.31 ±0.93 / 81.44 ms │     no change │
│ QQuery 50 │        59.42 / 59.66 ±0.21 / 60.02 ms │         61.22 / 62.16 ±0.77 / 63.32 ms │     no change │
│ QQuery 51 │      97.48 / 102.57 ±5.14 / 112.47 ms │        95.33 / 97.87 ±3.37 / 104.41 ms │     no change │
│ QQuery 52 │        23.96 / 24.44 ±0.36 / 25.04 ms │         25.76 / 26.48 ±1.01 / 28.45 ms │  1.08x slower │
│ QQuery 53 │        29.24 / 29.84 ±0.41 / 30.49 ms │         31.30 / 31.46 ±0.15 / 31.74 ms │  1.05x slower │
│ QQuery 54 │        55.52 / 55.94 ±0.29 / 56.32 ms │         58.66 / 59.11 ±0.28 / 59.50 ms │  1.06x slower │
│ QQuery 55 │        23.23 / 23.49 ±0.21 / 23.86 ms │         24.76 / 25.08 ±0.27 / 25.40 ms │  1.07x slower │
│ QQuery 56 │        39.18 / 40.86 ±2.27 / 45.35 ms │         41.42 / 43.09 ±2.48 / 48.02 ms │  1.05x slower │
│ QQuery 57 │     178.27 / 180.06 ±1.00 / 181.25 ms │      176.05 / 186.90 ±9.24 / 197.91 ms │     no change │
│ QQuery 58 │     115.42 / 117.05 ±1.49 / 119.56 ms │      119.49 / 121.19 ±0.98 / 122.22 ms │     no change │
│ QQuery 59 │     118.14 / 120.25 ±2.68 / 125.43 ms │      118.95 / 120.20 ±1.04 / 121.47 ms │     no change │
│ QQuery 60 │        40.41 / 41.24 ±0.55 / 41.88 ms │         40.26 / 40.65 ±0.33 / 41.24 ms │     no change │
│ QQuery 61 │        12.90 / 13.09 ±0.14 / 13.29 ms │         12.96 / 13.07 ±0.08 / 13.19 ms │     no change │
│ QQuery 62 │        46.65 / 47.22 ±0.67 / 48.48 ms │         46.50 / 47.10 ±0.51 / 47.75 ms │     no change │
│ QQuery 63 │        29.85 / 30.06 ±0.12 / 30.18 ms │         29.61 / 29.88 ±0.22 / 30.26 ms │     no change │
│ QQuery 64 │     408.95 / 415.70 ±5.48 / 422.78 ms │      409.19 / 413.25 ±3.44 / 419.19 ms │     no change │
│ QQuery 65 │     152.15 / 157.93 ±5.27 / 167.63 ms │      146.21 / 149.48 ±2.66 / 154.28 ms │ +1.06x faster │
│ QQuery 66 │        79.38 / 81.96 ±3.80 / 89.53 ms │         79.32 / 79.90 ±0.38 / 80.36 ms │     no change │
│ QQuery 67 │     267.11 / 276.67 ±7.07 / 285.51 ms │      237.83 / 241.61 ±3.69 / 246.28 ms │ +1.15x faster │
│ QQuery 68 │        13.19 / 13.27 ±0.08 / 13.41 ms │         12.00 / 12.15 ±0.18 / 12.50 ms │ +1.09x faster │
│ QQuery 69 │        61.14 / 61.48 ±0.24 / 61.81 ms │         58.20 / 58.65 ±0.41 / 59.31 ms │     no change │
│ QQuery 70 │     112.26 / 118.14 ±5.57 / 128.07 ms │      104.25 / 107.12 ±3.64 / 113.68 ms │ +1.10x faster │
│ QQuery 71 │        38.53 / 38.73 ±0.15 / 38.93 ms │         35.76 / 36.73 ±1.30 / 39.22 ms │ +1.05x faster │
│ QQuery 72 │ 2146.94 / 2202.81 ±56.45 / 2308.48 ms │ 2045.98 / 2243.06 ±152.98 / 2505.97 ms │     no change │
│ QQuery 73 │          9.76 / 9.89 ±0.14 / 10.15 ms │          9.97 / 10.16 ±0.13 / 10.36 ms │     no change │
│ QQuery 74 │     172.89 / 178.81 ±8.89 / 196.35 ms │      174.14 / 179.21 ±4.70 / 188.12 ms │     no change │
│ QQuery 75 │     152.80 / 157.50 ±8.68 / 174.84 ms │      149.67 / 154.40 ±6.59 / 167.40 ms │     no change │
│ QQuery 76 │        36.25 / 36.80 ±0.37 / 37.39 ms │         35.41 / 35.96 ±0.38 / 36.47 ms │     no change │
│ QQuery 77 │        62.22 / 63.00 ±0.74 / 64.38 ms │         61.20 / 61.93 ±0.79 / 63.44 ms │     no change │
│ QQuery 78 │     187.21 / 191.75 ±5.37 / 202.30 ms │      184.98 / 188.92 ±2.91 / 192.83 ms │     no change │
│ QQuery 79 │        67.77 / 72.38 ±4.62 / 80.73 ms │         67.23 / 67.98 ±0.71 / 69.27 ms │ +1.06x faster │
│ QQuery 80 │     105.88 / 108.22 ±1.79 / 111.34 ms │      100.15 / 101.64 ±2.19 / 105.97 ms │ +1.06x faster │
│ QQuery 81 │        28.57 / 28.80 ±0.35 / 29.48 ms │         25.99 / 26.20 ±0.14 / 26.41 ms │ +1.10x faster │
│ QQuery 82 │        18.04 / 18.20 ±0.14 / 18.44 ms │         16.53 / 16.69 ±0.17 / 17.02 ms │ +1.09x faster │
│ QQuery 83 │        43.75 / 47.63 ±5.33 / 58.05 ms │         40.63 / 40.81 ±0.16 / 41.06 ms │ +1.17x faster │
│ QQuery 84 │        32.64 / 33.37 ±0.43 / 33.93 ms │         30.76 / 30.92 ±0.13 / 31.06 ms │ +1.08x faster │
│ QQuery 85 │     112.26 / 113.96 ±0.96 / 115.14 ms │      108.29 / 113.15 ±6.70 / 126.42 ms │     no change │
│ QQuery 86 │        28.10 / 28.90 ±1.07 / 30.93 ms │         25.34 / 25.91 ±0.34 / 26.31 ms │ +1.12x faster │
│ QQuery 87 │        72.93 / 76.66 ±4.02 / 84.18 ms │         62.89 / 63.20 ±0.22 / 63.49 ms │ +1.21x faster │
│ QQuery 88 │        64.74 / 66.15 ±1.39 / 68.44 ms │         64.05 / 65.75 ±2.47 / 70.65 ms │     no change │
│ QQuery 89 │        36.45 / 37.30 ±0.44 / 37.65 ms │         36.25 / 36.75 ±0.60 / 37.84 ms │     no change │
│ QQuery 90 │        17.71 / 18.35 ±0.51 / 19.07 ms │         17.55 / 17.77 ±0.16 / 18.00 ms │     no change │
│ QQuery 91 │        47.87 / 50.73 ±3.57 / 57.67 ms │         46.71 / 46.93 ±0.14 / 47.12 ms │ +1.08x faster │
│ QQuery 92 │        32.38 / 32.86 ±0.47 / 33.74 ms │         29.99 / 30.43 ±0.37 / 31.08 ms │ +1.08x faster │
│ QQuery 93 │        51.65 / 52.46 ±1.09 / 54.44 ms │         50.76 / 52.49 ±2.11 / 56.44 ms │     no change │
│ QQuery 94 │        39.86 / 40.72 ±0.54 / 41.34 ms │         39.11 / 39.46 ±0.32 / 39.97 ms │     no change │
│ QQuery 95 │        85.49 / 88.04 ±3.14 / 94.12 ms │         81.97 / 82.28 ±0.27 / 82.68 ms │ +1.07x faster │
│ QQuery 96 │        25.56 / 25.71 ±0.22 / 26.14 ms │         24.43 / 24.67 ±0.25 / 25.14 ms │     no change │
│ QQuery 97 │        56.53 / 57.28 ±0.74 / 58.42 ms │         47.20 / 48.31 ±1.42 / 50.77 ms │ +1.19x faster │
│ QQuery 98 │        43.05 / 43.39 ±0.33 / 43.98 ms │         41.81 / 42.51 ±0.41 / 42.98 ms │     no change │
│ QQuery 99 │        71.12 / 73.48 ±3.12 / 79.44 ms │         70.55 / 70.97 ±0.41 / 71.64 ms │     no change │
└───────────┴───────────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 10291.46ms │
│ Total Time (codex_emit-first-partial-investigation)   │ 10167.82ms │
│ Average Time (HEAD)                                   │   103.95ms │
│ Average Time (codex_emit-first-partial-investigation) │   102.71ms │
│ Queries Faster                                        │         37 │
│ Queries Slower                                        │          7 │
│ Queries with No Change                                │         55 │
│ Queries with Failure                                  │          0 │
└───────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 2.0 GiB
Avg memory 1.4 GiB
CPU user 234.2s
CPU sys 6.0s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 2.1 GiB
Avg memory 1.4 GiB
CPU user 233.3s
CPU sys 5.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_emit-first-partial-investigation
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                       HEAD ┃ codex_emit-first-partial-investigation ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │               1.23 / 4.04 ±5.34 / 14.71 ms │           1.23 / 3.91 ±5.28 / 14.47 ms │      no change │
│ QQuery 1  │             13.06 / 13.26 ±0.16 / 13.43 ms │         12.85 / 13.03 ±0.24 / 13.49 ms │      no change │
│ QQuery 2  │             36.51 / 36.92 ±0.45 / 37.78 ms │         35.57 / 35.80 ±0.18 / 36.00 ms │      no change │
│ QQuery 3  │             30.99 / 31.78 ±0.73 / 32.78 ms │         30.56 / 31.02 ±0.45 / 31.74 ms │      no change │
│ QQuery 4  │      1617.99 / 1674.96 ±34.31 / 1714.82 ms │     222.02 / 236.62 ±19.23 / 274.38 ms │  +7.08x faster │
│ QQuery 5  │     1634.10 / 1737.49 ±107.32 / 1939.01 ms │     272.50 / 294.18 ±35.58 / 365.15 ms │  +5.91x faster │
│ QQuery 6  │                1.27 / 1.43 ±0.22 / 1.87 ms │            1.29 / 1.44 ±0.22 / 1.87 ms │      no change │
│ QQuery 7  │             13.87 / 17.07 ±6.24 / 29.55 ms │         13.54 / 15.91 ±4.35 / 24.60 ms │  +1.07x faster │
│ QQuery 8  │      2020.77 / 2046.29 ±26.84 / 2093.80 ms │     319.57 / 347.75 ±37.59 / 418.91 ms │  +5.88x faster │
│ QQuery 9  │         469.74 / 493.17 ±20.84 / 524.68 ms │      453.80 / 462.19 ±8.46 / 476.91 ms │  +1.07x faster │
│ QQuery 10 │             75.28 / 76.87 ±0.89 / 77.92 ms │         72.40 / 73.16 ±0.50 / 73.72 ms │      no change │
│ QQuery 11 │            89.35 / 96.09 ±7.80 / 106.25 ms │         83.28 / 84.35 ±0.94 / 85.79 ms │  +1.14x faster │
│ QQuery 12 │     1711.13 / 1785.55 ±106.54 / 1990.49 ms │      264.34 / 273.71 ±7.75 / 287.76 ms │  +6.52x faster │
│ QQuery 13 │        483.02 / 695.74 ±133.58 / 865.56 ms │     362.44 / 385.33 ±14.75 / 404.93 ms │  +1.81x faster │
│ QQuery 14 │          539.86 / 546.14 ±4.02 / 549.96 ms │      280.75 / 288.47 ±7.05 / 300.98 ms │  +1.89x faster │
│ QQuery 15 │      1814.85 / 1910.38 ±49.65 / 1952.23 ms │      271.35 / 280.11 ±7.86 / 293.49 ms │  +6.82x faster │
│ QQuery 16 │      4161.18 / 4296.38 ±86.99 / 4433.70 ms │     616.69 / 628.60 ±13.23 / 653.82 ms │  +6.83x faster │
│ QQuery 17 │     4205.66 / 4300.25 ±103.77 / 4429.35 ms │      616.28 / 626.15 ±8.30 / 638.41 ms │  +6.87x faster │
│ QQuery 18 │  17598.87 / 17983.64 ±361.69 / 18658.51 ms │  1271.65 / 1295.06 ±20.79 / 1333.19 ms │ +13.89x faster │
│ QQuery 19 │             28.07 / 30.21 ±2.94 / 35.85 ms │         27.57 / 27.92 ±0.24 / 28.32 ms │  +1.08x faster │
│ QQuery 20 │         517.51 / 527.27 ±12.15 / 550.66 ms │     521.60 / 536.56 ±28.27 / 593.07 ms │      no change │
│ QQuery 21 │          513.92 / 523.51 ±9.03 / 539.93 ms │      524.45 / 527.53 ±2.08 / 530.24 ms │      no change │
│ QQuery 22 │        982.63 / 996.00 ±11.22 / 1010.97 ms │   988.66 / 1009.12 ±14.06 / 1027.76 ms │      no change │
│ QQuery 23 │      3027.78 / 3064.58 ±37.62 / 3131.74 ms │  3099.11 / 3137.51 ±21.18 / 3157.66 ms │      no change │
│ QQuery 24 │             41.38 / 45.36 ±5.97 / 57.23 ms │         41.44 / 44.55 ±5.09 / 54.65 ms │      no change │
│ QQuery 25 │          112.49 / 116.42 ±6.97 / 130.35 ms │      112.07 / 115.81 ±5.11 / 125.87 ms │      no change │
│ QQuery 26 │             41.86 / 42.71 ±0.68 / 43.56 ms │         42.55 / 44.23 ±2.95 / 50.13 ms │      no change │
│ QQuery 27 │          673.99 / 684.46 ±8.81 / 696.52 ms │      675.80 / 683.70 ±5.86 / 690.63 ms │      no change │
│ QQuery 28 │      3580.28 / 3695.85 ±90.67 / 3804.45 ms │  3052.60 / 3077.43 ±20.15 / 3113.32 ms │  +1.20x faster │
│ QQuery 29 │             40.62 / 40.97 ±0.23 / 41.35 ms │        41.15 / 55.73 ±22.36 / 99.46 ms │   1.36x slower │
│ QQuery 30 │         561.53 / 575.65 ±14.93 / 594.24 ms │     300.97 / 316.85 ±12.94 / 334.78 ms │  +1.82x faster │
│ QQuery 31 │         281.20 / 305.53 ±20.20 / 339.62 ms │      291.50 / 296.68 ±4.23 / 302.67 ms │      no change │
│ QQuery 32 │       990.89 / 1024.00 ±32.50 / 1085.20 ms │   966.72 / 1006.52 ±34.21 / 1068.13 ms │      no change │
│ QQuery 33 │  25834.26 / 27777.82 ±995.02 / 28630.63 ms │  1486.22 / 1507.46 ±18.26 / 1538.45 ms │ +18.43x faster │
│ QQuery 34 │ 28069.49 / 29467.92 ±1131.78 / 31001.96 ms │  1497.86 / 1540.50 ±34.09 / 1598.49 ms │ +19.13x faster │
│ QQuery 35 │        966.62 / 991.83 ±21.34 / 1023.39 ms │     284.36 / 306.65 ±24.75 / 348.11 ms │  +3.23x faster │
│ QQuery 36 │         162.34 / 182.87 ±13.55 / 200.72 ms │         65.22 / 73.73 ±7.64 / 84.31 ms │  +2.48x faster │
│ QQuery 37 │             37.32 / 40.89 ±3.07 / 46.42 ms │         37.79 / 43.13 ±3.28 / 47.67 ms │   1.05x slower │
│ QQuery 38 │             42.46 / 49.99 ±8.18 / 63.24 ms │         44.90 / 46.16 ±1.93 / 49.99 ms │  +1.08x faster │
│ QQuery 39 │          185.21 / 192.95 ±7.33 / 204.98 ms │     139.01 / 162.07 ±12.18 / 173.15 ms │  +1.19x faster │
│ QQuery 40 │             14.31 / 14.86 ±0.35 / 15.24 ms │         14.50 / 14.66 ±0.12 / 14.83 ms │      no change │
│ QQuery 41 │             13.69 / 15.69 ±3.46 / 22.59 ms │         14.09 / 14.40 ±0.18 / 14.64 ms │  +1.09x faster │
│ QQuery 42 │             13.84 / 15.87 ±3.31 / 22.47 ms │         13.64 / 13.71 ±0.10 / 13.90 ms │  +1.16x faster │
└───────────┴────────────────────────────────────────────┴────────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                                     ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                                     │ 108170.66ms │
│ Total Time (codex_emit-first-partial-investigation)   │  19979.40ms │
│ Average Time (HEAD)                                   │   2515.60ms │
│ Average Time (codex_emit-first-partial-investigation) │    464.64ms │
│ Queries Faster                                        │          24 │
│ Queries Slower                                        │           2 │
│ Queries with No Change                                │          17 │
│ Queries with Failure                                  │           0 │
└───────────────────────────────────────────────────────┴─────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 545.1s
Peak memory 11.9 GiB
Avg memory 6.5 GiB
CPU user 4799.1s
CPU sys 319.6s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 105.0s
Peak memory 12.7 GiB
Avg memory 4.4 GiB
CPU user 1022.2s
CPU sys 73.7s
Peak spill 0 B

File an issue against this benchmark runner

@hhhizzz

hhhizzz commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@2010YOUY01 @Rachelint The result looks good. Can you help take a look when you have time?

@Rachelint Rachelint left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @hhhizzz

@Rachelint Rachelint added this pull request to the merge queue Jun 30, 2026
Merged via the queue into apache:main with commit 7d9f6ea Jun 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate EmitTo::First usage in partial hash aggregate output

4 participants