-
Notifications
You must be signed in to change notification settings - Fork 22
Add a single GPU python implementation of TPCH query 3 #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| customer_x_orders, # columns 0, 1 from customer, columns 2, 3, 4, 5 from orders | ||
| filtered_lineitem, | ||
| customer_x_orders_x_lineitem, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should flip these? But was hitting OOM issues.
| keep_keys: bool, | ||
| ) -> None: | ||
| left_tables: list[TableChunk] = [] | ||
| chunk_streams = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blindly switched from set to list as noted in the q09 PR. Have not reasoned through if this matters.
This PR implements TPC-H query 3 in rapidsmpf Python (perhaps sub-optimally).
When I run this q3 implementation at SF1K (parquet, floats not decimals,
part0.parquettopartN.parquetpartitioned tables) on 1x H100 of an internal DGX H100 system, I get the following performance:This is 2x faster than my SF1K q3 run from yesterday with cuDF Polars + rapidsmpf machinery, but 4-5x slower than the most recent run by @TomAugspurger .
DuckDB on the same machine and dataset has the following performance:
The output matches DuckDB.
DuckDB SF1K q3 output:
rapidsmpf Python SF1K q3 output: