Skip to content

chore: use Vec instead of OffsetBuilder#23195

Merged
comphead merged 2 commits into
apache:mainfrom
comphead:arrays
Jun 26, 2026
Merged

chore: use Vec instead of OffsetBuilder#23195
comphead merged 2 commits into
apache:mainfrom
comphead:arrays

Conversation

@comphead

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

arrow::buffer::OffsetBufferBuilder is a thin wrapper around Vec<O> plus a last_offset: usize running counter; every push_length(n) does a checked_add on usize and a usize_as(O) conversion. For
per-row loops with a known upfront row count, a direct Vec<O> that stores the running offset via offsets[row] + O::usize_as(len) can save measurable work in tight per-row loops — provided the offset push
is a meaningful fraction of per-row cost.

I swapped the pattern in all eight OffsetBufferBuilder call sites in the repo (array_normalize, array_filter, remove, replace, array_add, utils::general_array_zip_with, array_scale,
encoding::delegated_decode), benchmarked the three sites that have criterion benches, and found the win is not uniform.

What changes are included in this PR?

Replace OffsetBufferBuilder<O> with Vec<O> (preinitialized with O::zero() and finalized with OffsetBuffer::new(v.into())) only in datafusion/functions-nested/src/remove.rs, where benches show
clean wins with no regressions.

The other seven sites are left on OffsetBufferBuilder — benches showed flat-to-regressing results, see below.

Are these changes tested?

Existing unit tests, doctests, and sqllogictests (array_remove*) pass unchanged. No new tests — refactor is functionally equivalent.

Are there any user-facing changes?

No.

Benchmark results

The biggest win is array_remove

array_remove

Bench size 10 size 100 size 500
int64 −0.2% −1.0% −50.0%
n_int64 −0.7% +0.05% −23.1%
all_int64 +0.2% −1.8% −15.1%
strings +3.8% +0.6% −4.6%
boolean −0.01% +1.1% +0.3%
fixed_size_binary −0.06% −20.0% −2.6%
int64_nested flat flat flat

For others its more like noise

@github-actions github-actions Bot added the functions Changes to functions implementation label Jun 25, 2026
@comphead

Copy link
Copy Markdown
Contributor Author

@alamb FYI, for array_remove the win is obvious, however for others the perf is same.

@alamb

alamb commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

run benchmark array_remove

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4804368805-682-wj72w 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing arrays (4303be4) to a0e9887 (merge-base) diff using: array_remove
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                    HEAD                                   arrays
-----                                                                    ----                                   ------
array_remove_all_int64/remove/list size: 10, num_rows: 4000              1.00    243.0±1.36µs        ? ?/sec    1.01    245.3±2.26µs        ? ?/sec
array_remove_all_int64/remove/list size: 100, num_rows: 10000            1.00      2.6±0.03ms        ? ?/sec    1.02      2.7±0.03ms        ? ?/sec
array_remove_all_int64/remove/list size: 500, num_rows: 10000            1.00     12.4±0.19ms        ? ?/sec    1.24     15.4±1.06ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 10, num_rows: 4000       1.00     12.2±0.10ms        ? ?/sec    1.00     12.3±0.10ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 100, num_rows: 3000      1.00     68.5±0.87ms        ? ?/sec    1.02     69.9±1.16ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 300, num_rows: 1500      1.02    100.7±1.11ms        ? ?/sec    1.00     98.5±1.59ms        ? ?/sec
array_remove_boolean/remove/list size: 10, num_rows: 4000                1.00    313.5±1.31µs        ? ?/sec    1.01    316.3±0.71µs        ? ?/sec
array_remove_boolean/remove/list size: 100, num_rows: 10000              1.00   1668.1±3.93µs        ? ?/sec    1.01   1683.3±9.46µs        ? ?/sec
array_remove_boolean/remove/list size: 500, num_rows: 10000              1.00      5.4±0.01ms        ? ?/sec    1.00      5.4±0.01ms        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 10, num_rows: 4000      1.00    306.9±1.50µs        ? ?/sec    1.00    307.5±1.84µs        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 100, num_rows: 10000    1.00      3.5±0.08ms        ? ?/sec    1.00      3.5±0.08ms        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 500, num_rows: 10000    1.00     32.4±0.15ms        ? ?/sec    1.02     33.1±0.15ms        ? ?/sec
array_remove_int64/remove/list size: 10, num_rows: 4000                  1.00    232.5±1.43µs        ? ?/sec    1.00    233.1±1.66µs        ? ?/sec
array_remove_int64/remove/list size: 100, num_rows: 10000                1.00  1209.3±12.20µs        ? ?/sec    1.00  1205.7±22.04µs        ? ?/sec
array_remove_int64/remove/list size: 500, num_rows: 10000                1.00     12.5±0.19ms        ? ?/sec    1.02     12.7±0.22ms        ? ?/sec
array_remove_int64_nested/remove/list size: 10, num_rows: 4000           1.00     12.2±0.10ms        ? ?/sec    1.00     12.2±0.09ms        ? ?/sec
array_remove_int64_nested/remove/list size: 100, num_rows: 3000          1.00     67.9±0.72ms        ? ?/sec    1.00     67.9±0.72ms        ? ?/sec
array_remove_int64_nested/remove/list size: 300, num_rows: 1500          1.00     97.6±1.68ms        ? ?/sec    1.00     97.4±1.55ms        ? ?/sec
array_remove_n_int64/remove/list size: 10, num_rows: 4000                1.00    241.2±1.27µs        ? ?/sec    1.01    244.4±2.29µs        ? ?/sec
array_remove_n_int64/remove/list size: 100, num_rows: 10000              1.00  1830.5±15.44µs        ? ?/sec    1.00  1838.8±18.10µs        ? ?/sec
array_remove_n_int64/remove/list size: 500, num_rows: 10000              1.00      9.3±0.24ms        ? ?/sec    1.83     17.0±0.26ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 10, num_rows: 4000         1.00     12.3±0.10ms        ? ?/sec    1.00     12.4±0.12ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 100, num_rows: 3000        1.01     70.9±2.12ms        ? ?/sec    1.00     70.4±2.23ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 300, num_rows: 1500        1.00     98.3±2.01ms        ? ?/sec    1.03    101.2±2.05ms        ? ?/sec
array_remove_strings/remove/list size: 10, num_rows: 4000                1.01    416.3±1.20µs        ? ?/sec    1.00    412.5±1.39µs        ? ?/sec
array_remove_strings/remove/list size: 100, num_rows: 10000              1.00      6.5±0.03ms        ? ?/sec    1.00      6.5±0.03ms        ? ?/sec
array_remove_strings/remove/list size: 500, num_rows: 10000              1.10     46.1±0.21ms        ? ?/sec    1.00     41.8±0.90ms        ? ?/sec

Resource Usage

array_remove — base (merge-base)

Metric Value
Wall time 525.1s
Peak memory 222.9 MiB
Avg memory 37.2 MiB
CPU user 341.4s
CPU sys 17.8s
Peak spill 0 B

array_remove — branch

Metric Value
Wall time 515.1s
Peak memory 188.4 MiB
Avg memory 37.7 MiB
CPU user 333.6s
CPU sys 22.6s
Peak spill 0 B

File an issue against this benchmark runner

@comphead

Copy link
Copy Markdown
Contributor Author

run benchmark array_remove

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4804721008-683-b54h9 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing arrays (4303be4) to a0e9887 (merge-base) diff using: array_remove
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                    HEAD                                   arrays
-----                                                                    ----                                   ------
array_remove_all_int64/remove/list size: 10, num_rows: 4000              1.00    243.1±1.49µs        ? ?/sec    1.01    245.3±2.06µs        ? ?/sec
array_remove_all_int64/remove/list size: 100, num_rows: 10000            1.00      2.6±0.03ms        ? ?/sec    1.03      2.6±0.03ms        ? ?/sec
array_remove_all_int64/remove/list size: 500, num_rows: 10000            1.00     12.2±0.16ms        ? ?/sec    1.01     12.3±0.16ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 10, num_rows: 4000       1.00     12.3±0.10ms        ? ?/sec    1.00     12.2±0.11ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 100, num_rows: 3000      1.01     68.7±1.02ms        ? ?/sec    1.00     68.3±0.71ms        ? ?/sec
array_remove_all_int64_nested/remove/list size: 300, num_rows: 1500      1.00     98.6±1.35ms        ? ?/sec    1.00     98.1±1.03ms        ? ?/sec
array_remove_boolean/remove/list size: 10, num_rows: 4000                1.00    313.7±1.30µs        ? ?/sec    1.01    316.3±0.78µs        ? ?/sec
array_remove_boolean/remove/list size: 100, num_rows: 10000              1.00   1667.6±3.21µs        ? ?/sec    1.01   1683.6±9.54µs        ? ?/sec
array_remove_boolean/remove/list size: 500, num_rows: 10000              1.00      5.4±0.01ms        ? ?/sec    1.00      5.4±0.01ms        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 10, num_rows: 4000      1.04    319.5±2.31µs        ? ?/sec    1.00    306.5±1.85µs        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 100, num_rows: 10000    1.02      3.4±0.05ms        ? ?/sec    1.00      3.4±0.04ms        ? ?/sec
array_remove_fixed_size_binary/remove/list size: 500, num_rows: 10000    1.00     33.1±0.09ms        ? ?/sec    1.03     34.0±0.10ms        ? ?/sec
array_remove_int64/remove/list size: 10, num_rows: 4000                  1.00    232.7±1.52µs        ? ?/sec    1.01    234.8±1.71µs        ? ?/sec
array_remove_int64/remove/list size: 100, num_rows: 10000                1.00   1184.6±9.16µs        ? ?/sec    1.04  1234.0±23.88µs        ? ?/sec
array_remove_int64/remove/list size: 500, num_rows: 10000                1.00     12.3±0.29ms        ? ?/sec    1.08     13.3±0.10ms        ? ?/sec
array_remove_int64_nested/remove/list size: 10, num_rows: 4000           1.00     12.2±0.11ms        ? ?/sec    1.00     12.2±0.10ms        ? ?/sec
array_remove_int64_nested/remove/list size: 100, num_rows: 3000          1.00     68.0±0.99ms        ? ?/sec    1.00     67.7±0.88ms        ? ?/sec
array_remove_int64_nested/remove/list size: 300, num_rows: 1500          1.00     97.5±1.52ms        ? ?/sec    1.02     99.6±1.25ms        ? ?/sec
array_remove_n_int64/remove/list size: 10, num_rows: 4000                1.00    241.3±1.34µs        ? ?/sec    1.02    244.9±3.31µs        ? ?/sec
array_remove_n_int64/remove/list size: 100, num_rows: 10000              1.00  1808.5±13.70µs        ? ?/sec    1.01  1828.9±15.19µs        ? ?/sec
array_remove_n_int64/remove/list size: 500, num_rows: 10000              1.85     16.6±0.15ms        ? ?/sec    1.00      9.0±0.13ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 10, num_rows: 4000         1.00     12.3±0.09ms        ? ?/sec    1.00     12.3±0.10ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 100, num_rows: 3000        1.01     68.3±1.06ms        ? ?/sec    1.00     68.0±0.79ms        ? ?/sec
array_remove_n_int64_nested/remove/list size: 300, num_rows: 1500        1.00     97.8±1.29ms        ? ?/sec    1.00     97.7±1.07ms        ? ?/sec
array_remove_strings/remove/list size: 10, num_rows: 4000                1.01    414.3±1.18µs        ? ?/sec    1.00    412.2±1.45µs        ? ?/sec
array_remove_strings/remove/list size: 100, num_rows: 10000              1.00      6.5±0.02ms        ? ?/sec    1.00      6.5±0.02ms        ? ?/sec
array_remove_strings/remove/list size: 500, num_rows: 10000              1.00     36.7±0.08ms        ? ?/sec    1.28     46.9±0.10ms        ? ?/sec

Resource Usage

array_remove — base (merge-base)

Metric Value
Wall time 375.1s
Peak memory 206.1 MiB
Avg memory 53.9 MiB
CPU user 336.7s
CPU sys 20.7s
Peak spill 0 B

array_remove — branch

Metric Value
Wall time 370.1s
Peak memory 225.0 MiB
Avg memory 51.2 MiB
CPU user 338.6s
CPU sys 18.8s
Peak spill 0 B

File an issue against this benchmark runner

@comphead

Copy link
Copy Markdown
Contributor Author

the benchmark is confusing, the same metric from 2 different runs, reports

array_remove_n_int64/remove/list size: 500, num_rows: 10000              1.00      9.3±0.24ms        ? ?/sec    1.83     17.0±0.26ms        ? ?/sec

and then

array_remove_n_int64/remove/list size: 500, num_rows: 10000              1.85     16.6±0.15ms        ? ?/sec    1.00      9.0±0.13ms        ? ?/sec

I'll check again my local benchmarks, if its reported as faster I'll follow local recommendations, however even now the PR brings some purpose in terms of unification

@comphead comphead added this pull request to the merge queue Jun 25, 2026
Merged via the queue into apache:main with commit 32e27ac Jun 26, 2026
35 checks passed
@comphead comphead deleted the arrays branch June 26, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants