-
Notifications
You must be signed in to change notification settings - Fork 11
Port pull request #117 from prometheus-community/parquet-common #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GiedriusS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetic suggestions, didn't have enough time right now to look through other changes
110be77 to
547fda0
Compare
88ed15a to
073110b
Compare
Signed-off-by: Daniel Dai <[email protected]>
Signed-off-by: Daniel Dai <[email protected]>
Signed-off-by: Daniel Dai <[email protected]>
Signed-off-by: Daniel Dai <[email protected]>
Signed-off-by: Daniel Dai <[email protected]>
Signed-off-by: Daniel Dai <[email protected]>
073110b to
c431436
Compare
|
Closes #23 ? |
yes! |
Signed-off-by: Daniel Dai <[email protected]>
|
One last comment, otherwise lgtm! |
Signed-off-by: Daniel Dai <[email protected]>
| rr.rowBuilder.Add(lc.ColumnIndex, parquet.ValueOf(l.Value)) | ||
| // we need to address for projecting chunk columns away later so we need to correct for the offset here | ||
| colIdxSlice = append(colIdxSlice, lc.ColumnIndex-schema.ChunkColumnsPerDay) | ||
| colIdxSlice = append(colIdxSlice, lc.ColumnIndex-schema.ChunkColumnsPerDay-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the index change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the previous comment, this is a bug originally but corrected in optimizeShard. Now we removed optimizeShard and the bug manifests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right! I tested it on main and it works fine!
Port prometheus-community/parquet-common#117: Parallelize shard conversion to parquet-gateway. Added command line option convert.write.concurrency (default=4) to specify number of parallel parquet writers.
Test result:
I have converted one day of our production data, parquet write time went down from 8 hours 2 min to 1 hour 38 min. Total conversion time (including download/index reading/sorting/chunk reading/parquet write) went down from 10 hours 20 min to 4 hours 1 min. Memory usage is similar.