Skip to content

Conversation

tobixdev
Copy link
Contributor

@tobixdev tobixdev commented Oct 1, 2025

Which issue does this PR close?

I want to see any performance regressions to BooleanArray::from_iter.

Rationale for this change

Add microbenchmarks for observing the performance of XYZArray::from_iter.

On my machine, executing the benchmarks back to back results in deviations within 1% .

Int64Array::from_iter   time:   [14.292 µs 14.297 µs 14.303 µs]
                        change: [-0.0049% +0.1290% +0.2631%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 26 outliers among 100 measurements (26.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  9 (9.00%) high mild
  13 (13.00%) high severe

Int64Array::from_trusted_len_iter
                        time:   [6.7355 µs 6.7472 µs 6.7628 µs]
                        change: [+0.0215% +0.1868% +0.3739%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

BooleanArray::from_iter time:   [7.3389 µs 7.3596 µs 7.3861 µs]
                        change: [-1.3820% -0.8065% -0.2803%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  9 (9.00%) high mild
  7 (7.00%) high severe

What changes are included in this PR?

Only benchmarks

Are these changes tested?

Functionality is tested in the implementation file.

Are there any user-facing changes?

None

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 1, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @tobixdev -- looks great to me

// All ArrowPrimitiveType use the same implementation
c.bench_function("Int64Array::from_iter", |b| {
let values = gen_vector(1, ITER_LEN);
b.iter(|| hint::black_box(Int64Array::from_iter(values.iter())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks similar to the existing benchmark here: https://github.com/apache/arrow-rs/blob/31ea84453b2f7ed7aa4e85825bd6cbf7ecd45f3a/arrow/benches/buffer_create.rs#L179-L178

Perhaps you could extend that benchmark rather than adding a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly makes sense to add them to another benchmark. I wasn't aware of the suite at the top-level crate.

What do think about renaming array_from_vec to array_from and adding them there? I think this would also be related code but we wouldn't mix Array::from_xyz with Buffer::from_xyz. But I am also fine with adding it to buffer_create.

https://github.com/apache/arrow-rs/blob/31ea84453b2f7ed7aa4e85825bd6cbf7ecd45f3a/arrow/benches/array_from_vec.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update it now to use array_from. I can change it if you prefer buffer_create.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tobixdev


[[bench]]
name = "array_from_vec"
name = "array_from"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit f88921c into apache:main Oct 2, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants