-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] perf: Create PrimitiveArrays directly rather than via ArrayData
#9122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| .consume_bitmap_buffer() | ||
| .map(|b| NullBuffer::new(BooleanBuffer::new(b, 0, len))); | ||
|
|
||
| let array_data = unsafe { array_data.build_unchecked() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the point is to avoid this ArrayData (and its Vecs) as we know here what types of arrays will be built
|
run benchmark arrow_reader arrow_reader_clickbench |
|
🤖 |
9091d97 to
4d06746
Compare
ArrayDataPrimitiveArrays directly rather than via ArrayData
|
🤖: Benchmark completed Details
|
|
run benchmark arrow_reader |
|
🤖 |
|
It seems like this may be worth a few percent in performance. I have restarted the benchmarks to see if I can reproduce the numbers |
|
🤖: Benchmark completed Details
|
|
There is a lot of noise in the benchmark, but I think this PR shows a few percent improvement for primitive arrays |
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks for the review @scovich -- I will plan to merge this PR on Monday unless there are any other comments |
Which issue does this PR close?
make_array) #9061Rationale for this change
ArrayData(1% improvement) #9120Creating Arrays via ArrayData /
make_arrayhas overhead (at least 2 Vec allocations) compared to simply creating the arrays directlyWhat changes are included in this PR?
Update the parquet reader to create
PrimitiveArrays directlyAre these changes tested?
By CI
Are there any user-facing changes?