Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 7, 2026

Which issue does this PR close?

Rationale for this change

The current implementation of make_array for StructArray and GenericByteViewArray clones ArrayData which allocates a new Vec. This is unnecessary given that make_array is passed an owned ArrayData

What changes are included in this PR?

  1. Add a new API to ArrayData to break it down into parts (into_parts)
  2. Use that API to avoid cloning while constructing StructArray and GenericByteViewArray

Are these changes tested?

Yes by CI

Are there any user-facing changes?

A few fewer allocations when creating arrays

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 7, 2026
fn from(value: ArrayData) -> Self {
let views = value.buffers()[0].clone();
let views = ScalarBuffer::new(views, value.offset(), value.len());
let buffers = value.buffers()[1..].to_vec();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call to_vec() allocates a new Vec which is unecessary


impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T> {
fn from(value: ArrayData) -> Self {
let views = value.buffers()[0].clone();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloneing the buffers is relatively cheap (they are Arcd internally) so avoiding this just makes the code easier to follow, I don't think it will be any significant savings

make_array(cd.slice(parent_offset, parent_len))
} else {
make_array(cd.clone())
make_array(cd)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd.clone() clones the ArrayData (and allocates a new Vec) for each child, recursively, which is unecessary

.into_iter()
.map(|cd| {
if parent_offset != 0 || parent_len != cd.len() {
make_array(cd.slice(parent_offset, parent_len))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably avoid an additional allocation for sliced arrays by making a version of slice() that consumes self -- like sliced() perhaps 🤔

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

Thanks @mhilton

fn from(data: ArrayData) -> Self {
let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts();
let views = buffers.remove(0); // need to maintain order of remaining buffers
let buffers = buffers.into(); // convert to Arc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let buffers = buffers.into(); // convert to Arc
let buffers = Arc::from(buffers);

Is this equivalent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask the same thing... except for Arc::new. Longer than .into() but shorter than having to comment it.

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit

fn from(data: ArrayData) -> Self {
let (_data_type, len, nulls, offset, mut buffers, _child_data) = data.into_parts();
let views = buffers.remove(0); // need to maintain order of remaining buffers
let buffers = buffers.into(); // convert to Arc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask the same thing... except for Arc::new. Longer than .into() but shorter than having to comment it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants