Add a single GPU python implementation of TPCH query 3 #629

beckernick · 2025-11-05T03:08:19Z

This PR implements TPC-H query 3 in rapidsmpf Python (perhaps sub-optimally).

When I run this q3 implementation at SF1K (parquet, floats not decimals, part0.parquet to partN.parquet partitioned tables) on 1x H100 of an internal DGX H100 system, I get the following performance:

CUDA_VISIBLE_DEVICES=4 python q03.py
Iteration 0: Pipeline construction 0.01088s
Iteration 0: Pipeline execution 33.1s
Iteration 1: Pipeline construction 0.007478s
Iteration 1: Pipeline execution 30.65s
Iteration 2: Pipeline construction 0.006944s
Iteration 2: Pipeline execution 32.34s
Iteration 3: Pipeline construction 0.007217s
Iteration 3: Pipeline execution 27.17s

This is 2x faster than my SF1K q3 run from yesterday with cuDF Polars + rapidsmpf machinery, but 4-5x slower than the most recent run by @TomAugspurger .

DuckDB on the same machine and dataset has the following performance:

python -m queries.duckdb.q3
Code block 'Run duckdb query 3' took: 10.93807 s
Code block 'Run duckdb query 3' took: 4.14054 s
Code block 'Run duckdb query 3' took: 4.08890 s

The output matches DuckDB.

DuckDB SF1K q3 output:

┌────────────┬────────────────────┬─────────────────────┬────────────────┐
│ l_orderkey │      revenue       │     o_orderdate     │ o_shippriority │
│   int64    │       double       │      timestamp      │     int32      │
├────────────┼────────────────────┼─────────────────────┼────────────────┤
│   18869634 │        512508.6578 │ 1995-01-10 00:00:00 │              0 │
│ 3947421511 │ 507889.04639999993 │ 1995-03-14 00:00:00 │              0 │
│ 1319897249 │        503401.9508 │ 1995-03-05 00:00:00 │              0 │
│ 2036965252 │  495852.8691999999 │ 1995-03-03 00:00:00 │              0 │
│ 1980912577 │ 493605.46589999995 │ 1995-02-14 00:00:00 │              0 │
│ 4803840546 │ 492521.94299999997 │ 1995-02-18 00:00:00 │              0 │
│ 3407391428 │ 491379.51860000007 │ 1995-03-09 00:00:00 │              0 │
│ 5289035781 │        488004.5812 │ 1995-03-11 00:00:00 │              0 │
│ 5530172133 │        487671.6623 │ 1995-02-07 00:00:00 │              0 │
│ 3885365216 │  487236.4125999999 │ 1995-03-04 00:00:00 │              0 │
├────────────┴────────────────────┴─────────────────────┴────────────────┤
│ 10 rows                                                      4 columns │
└────────────────────────────────────────────────────────────────────────┘

rapidsmpf Python SF1K q3 output:

   l_orderkey      revenue               o_orderdate  o_shippriority
0    18869634  512508.6578 1995-01-10 00:00:00+00:00               0
1  3947421511  507889.0464 1995-03-14 00:00:00+00:00               0
2  1319897249  503401.9508 1995-03-05 00:00:00+00:00               0
3  2036965252  495852.8692 1995-03-03 00:00:00+00:00               0
4  1980912577  493605.4659 1995-02-14 00:00:00+00:00               0
5  4803840546  492521.9430 1995-02-18 00:00:00+00:00               0
6  3407391428  491379.5186 1995-03-09 00:00:00+00:00               0
7  5289035781  488004.5812 1995-03-11 00:00:00+00:00               0
8  5530172133  487671.6623 1995-02-07 00:00:00+00:00               0
9  3885365216  487236.4126 1995-03-04 00:00:00+00:00               0

copy-pr-bot · 2025-11-05T03:08:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

beckernick · 2025-11-05T03:30:39Z

python/rapidsmpf/rapidsmpf/examples/streaming/ndsh/q03.py

+            customer_x_orders, # columns 0, 1 from customer, columns 2, 3, 4, 5 from orders
+            filtered_lineitem,
+            customer_x_orders_x_lineitem,


Probably should flip these? But was hitting OOM issues.

beckernick · 2025-11-05T16:53:22Z

python/rapidsmpf/rapidsmpf/examples/streaming/ndsh/q03.py

+    keep_keys: bool,
+) -> None:
+    left_tables: list[TableChunk] = []
+    chunk_streams = []


Blindly switched from set to list as noted in the q09 PR. Have not reasoned through if this matters.

#627 (comment)

q03 correct output with core dump

aff9473

beckernick changed the title ~~Add a python implementation of TPCH query 3~~ Add a single GPU python implementation of TPCH query 3 Nov 5, 2025

beckernick commented Nov 5, 2025

View reviewed changes

beckernick added 3 commits November 5, 2025 06:49

switch from set to list

81ded18

testing

db1fc7d

remove unneeded columns

1caaa8b

beckernick marked this pull request as ready for review November 5, 2025 15:28

beckernick requested a review from a team as a code owner November 5, 2025 15:28

beckernick added 2 commits November 5, 2025 07:28

remove unnecessary columns midway through

679408f

Merge branch 'main' into feature/tpch-q03-python

2082a36

madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 5, 2025

madsbk assigned beckernick Nov 5, 2025

beckernick commented Nov 5, 2025

View reviewed changes

beckernick added 2 commits November 5, 2025 09:25

newline EOF

c2e1afa

update channel creation for new interface

8e0a9d1

beckernick mentioned this pull request Nov 11, 2025

[WIP] C++ implementation of TPCH query 3 #650

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a single GPU python implementation of TPCH query 3 #629

Add a single GPU python implementation of TPCH query 3 #629

Uh oh!

beckernick commented Nov 5, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 5, 2025

Uh oh!

beckernick Nov 5, 2025 •

edited

Loading

Uh oh!

beckernick Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a single GPU python implementation of TPCH query 3 #629

Are you sure you want to change the base?

Add a single GPU python implementation of TPCH query 3 #629

Uh oh!

Conversation

beckernick commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Nov 5, 2025

Uh oh!

beckernick Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beckernick Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beckernick commented Nov 5, 2025 •

edited

Loading

beckernick Nov 5, 2025 •

edited

Loading