-
Notifications
You must be signed in to change notification settings - Fork 19
feat: Chunkify single parts to generate them in parallel #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I ran with
cargo run --release -- --tables lineitem --scale-factor 1000 --part 1 --parts 100 --format=parquet
And it kept all my cores busy as promised.
Thank yoU @clflushopt
I made a PR to add some integration tests here I think #156 covers the correctness aspect of this PR I am not sure it covers the "make it parallel" functional aspect, but then again I don 't really have a great story about that |
I wrote up some high level thoughts here |
Closing this PR since the approach used doesn't guarantee consistent output. |
I merged up from main -- there are a few test items i need to fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clflushopt -- if you have a moment perhaps you could look at this PR
I found:
- The "split the job into multiple parts" works well (we just have to avoid Region and Nation tables) -- it passes tests well
- The interplay of parts / part and multiple files is a little confusing
- There is no good way to return errors
My suggested path forward is:
- Go with the approach in this PR
- Make a follow on refactoring PR that changes
parts
andpart
toOption<i32>
and defaults them toNone
to make the logic clear
What are your thoughts
Sounds good @alamb did a sanity check at scale factor 10 with master it all looks good ! About your comment yes, I do agree that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be merged once you approve @alamb thanks for picking this one up !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @clflushopt
--part
and--parts
are specified) #80