Skip to content

Conversation

@daijy
Copy link

@daijy daijy commented Nov 15, 2025

Port prometheus-community/parquet-common#117: Parallelize shard conversion to parquet-gateway. Added command line option convert.write.concurrency (default=4) to specify number of parallel parquet writers.

Test result:
I have converted one day of our production data, parquet write time went down from 8 hours 2 min to 1 hour 38 min. Total conversion time (including download/index reading/sorting/chunk reading/parquet write) went down from 10 hours 20 min to 4 hours 1 min. Memory usage is similar.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetic suggestions, didn't have enough time right now to look through other changes

@GiedriusS
Copy link
Member

@daijy daijy force-pushed the parallelconversion branch from 110be77 to 547fda0 Compare November 19, 2025 23:47
@daijy daijy force-pushed the parallelconversion branch from 88ed15a to 073110b Compare November 23, 2025 18:06
@daijy daijy force-pushed the parallelconversion branch from 073110b to c431436 Compare November 23, 2025 18:07
@wiardvanrij
Copy link
Member

Closes #23 ?

@MichaHoffmann
Copy link
Collaborator

Closes #23 ?

yes!

Signed-off-by: Daniel Dai <[email protected]>
@MichaHoffmann
Copy link
Collaborator

One last comment, otherwise lgtm!

Signed-off-by: Daniel Dai <[email protected]>
@wiardvanrij wiardvanrij merged commit caaa1ca into thanos-io:main Nov 27, 2025
5 checks passed
rr.rowBuilder.Add(lc.ColumnIndex, parquet.ValueOf(l.Value))
// we need to address for projecting chunk columns away later so we need to correct for the offset here
colIdxSlice = append(colIdxSlice, lc.ColumnIndex-schema.ChunkColumnsPerDay)
colIdxSlice = append(colIdxSlice, lc.ColumnIndex-schema.ChunkColumnsPerDay-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the index change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the previous comment, this is a bug originally but corrected in optimizeShard. Now we removed optimizeShard and the bug manifests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right! I tested it on main and it works fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants