Skip to content

Conversation

@beckernick
Copy link
Member

@beckernick beckernick commented Nov 5, 2025

This PR implements TPC-H query 3 in rapidsmpf Python (perhaps sub-optimally).

When I run this q3 implementation at SF1K (parquet, floats not decimals, part0.parquet to partN.parquet partitioned tables) on 1x H100 of an internal DGX H100 system, I get the following performance:

CUDA_VISIBLE_DEVICES=4 python q03.py
Iteration 0: Pipeline construction 0.01088s
Iteration 0: Pipeline execution 33.1s
Iteration 1: Pipeline construction 0.007478s
Iteration 1: Pipeline execution 30.65s
Iteration 2: Pipeline construction 0.006944s
Iteration 2: Pipeline execution 32.34s
Iteration 3: Pipeline construction 0.007217s
Iteration 3: Pipeline execution 27.17s

This is 2x faster than my SF1K q3 run from yesterday with cuDF Polars + rapidsmpf machinery, but 4-5x slower than the most recent run by @TomAugspurger .

DuckDB on the same machine and dataset has the following performance:

python -m queries.duckdb.q3
Code block 'Run duckdb query 3' took: 10.93807 s
Code block 'Run duckdb query 3' took: 4.14054 s
Code block 'Run duckdb query 3' took: 4.08890 s

The output matches DuckDB.

DuckDB SF1K q3 output:

┌────────────┬────────────────────┬─────────────────────┬────────────────┐
│ l_orderkey │      revenue       │     o_orderdate     │ o_shippriority │
│   int64    │       double       │      timestamp      │     int32      │
├────────────┼────────────────────┼─────────────────────┼────────────────┤
│   18869634 │        512508.6578 │ 1995-01-10 00:00:00 │              0 │
│ 3947421511 │ 507889.04639999993 │ 1995-03-14 00:00:00 │              0 │
│ 1319897249 │        503401.9508 │ 1995-03-05 00:00:00 │              0 │
│ 2036965252 │  495852.8691999999 │ 1995-03-03 00:00:00 │              0 │
│ 1980912577 │ 493605.46589999995 │ 1995-02-14 00:00:00 │              0 │
│ 4803840546 │ 492521.94299999997 │ 1995-02-18 00:00:00 │              0 │
│ 3407391428 │ 491379.51860000007 │ 1995-03-09 00:00:00 │              0 │
│ 5289035781 │        488004.5812 │ 1995-03-11 00:00:00 │              0 │
│ 5530172133 │        487671.6623 │ 1995-02-07 00:00:00 │              0 │
│ 3885365216 │  487236.4125999999 │ 1995-03-04 00:00:00 │              0 │
├────────────┴────────────────────┴─────────────────────┴────────────────┤
│ 10 rows                                                      4 columns │
└────────────────────────────────────────────────────────────────────────┘

rapidsmpf Python SF1K q3 output:

   l_orderkey      revenue               o_orderdate  o_shippriority
0    18869634  512508.6578 1995-01-10 00:00:00+00:00               0
1  3947421511  507889.0464 1995-03-14 00:00:00+00:00               0
2  1319897249  503401.9508 1995-03-05 00:00:00+00:00               0
3  2036965252  495852.8692 1995-03-03 00:00:00+00:00               0
4  1980912577  493605.4659 1995-02-14 00:00:00+00:00               0
5  4803840546  492521.9430 1995-02-18 00:00:00+00:00               0
6  3407391428  491379.5186 1995-03-09 00:00:00+00:00               0
7  5289035781  488004.5812 1995-03-11 00:00:00+00:00               0
8  5530172133  487671.6623 1995-02-07 00:00:00+00:00               0
9  3885365216  487236.4126 1995-03-04 00:00:00+00:00               0

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@beckernick beckernick changed the title Add a python implementation of TPCH query 3 Add a single GPU python implementation of TPCH query 3 Nov 5, 2025
Comment on lines +626 to +628
customer_x_orders, # columns 0, 1 from customer, columns 2, 3, 4, 5 from orders
filtered_lineitem,
customer_x_orders_x_lineitem,
Copy link
Member Author

@beckernick beckernick Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should flip these? But was hitting OOM issues.

@beckernick beckernick marked this pull request as ready for review November 5, 2025 15:28
@beckernick beckernick requested a review from a team as a code owner November 5, 2025 15:28
@madsbk madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 5, 2025
keep_keys: bool,
) -> None:
left_tables: list[TableChunk] = []
chunk_streams = []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blindly switched from set to list as noted in the q09 PR. Have not reasoned through if this matters.

#627 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants