-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am profiling clickbench query 10 with predicate pushdown enabled as part of
samply record -- /Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli -f q.sql > /dev/null 2>&1SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;While looking at the profile, I noticed that 7% of the time is spent in allocating / regrowing vectors (aka reallocating and copying)
Describe the solution you'd like
Avoid the time spent regrowing these vectors
It appears that the vectors in question are part of the ViewBuffer struct:
arrow-rs/parquet/src/arrow/buffer/view_buffer.rs
Lines 30 to 33 in 02fa779
| pub struct ViewBuffer { | |
| pub views: Vec<u128>, | |
| pub buffers: Vec<Buffer>, | |
| } |
Describe alternatives you've considered
Since we know how many views will be in each output buffer, we could create the ViewBuffers with the correct size initially
Something like like
ViewBuffers::with_capacityAdditional context