Skip to content

Remove unconditional large left table skip in mark join benchmarks#21909

Merged
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
PointKernel:fix-join-bench-skip
Apr 1, 2026
Merged

Remove unconditional large left table skip in mark join benchmarks#21909
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
PointKernel:fix-join-bench-skip

Conversation

@PointKernel
Copy link
Copy Markdown
Member

@PointKernel PointKernel commented Mar 24, 2026

Description

This PR eliminates the hardcoded unconditional skip for large left tables in mark join benchmarks, giving users full control over table processing

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel requested a review from a team as a code owner March 24, 2026 20:56
@PointKernel PointKernel requested a review from devavret March 24, 2026 20:56
@PointKernel PointKernel added the 3 - Ready for Review Ready for review by team label Mar 24, 2026
@PointKernel PointKernel requested a review from lamarrr March 24, 2026 20:56
@PointKernel PointKernel added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 24, 2026
@PointKernel
Copy link
Copy Markdown
Member Author

The updated benchmark results on my RTX PRO 6000

# Benchmark Results

## left_anti_join

### [0] NVIDIA RTX PRO 6000 Blackwell Workstation Edition

| Nullable | NullEquality  | DataType | left_size | right_size | num_probes | selectivity | join_type | join_input_size | Samples |  CPU Time  | Noise |  GPU Time  | Noise |  Elem/s  | GlobalMem BW | BWUtil |
|----------|---------------|----------|-----------|------------|------------|-------------|-----------|-----------------|---------|------------|-------|------------|-------|----------|--------------|--------|
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |       1000 |          4 |         0.3 | mark_join |           24250 |   2160x | 241.296 us | 5.98% | 237.215 us | 6.10% | 102.228M | 102.228 MB/s |  0.01% |
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |     100000 |          4 |         0.3 | mark_join |         1224625 |   2512x | 203.634 us | 5.65% | 199.561 us | 5.76% |   6.137G |   6.137 GB/s |  0.34% |
|        0 | NULLS_UNEQUAL |    INT32 |    100000 |     100000 |          4 |         0.3 | mark_join |         2425000 |   2352x | 217.477 us | 7.97% | 213.561 us | 9.93% |  11.355G |  11.355 GB/s |  0.63% |
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |   10000000 |          4 |         0.3 | mark_join |       121262125 |   1392x | 740.109 us | 2.00% | 736.030 us | 2.01% | 164.752G | 164.752 GB/s |  9.19% |
|        0 | NULLS_UNEQUAL |    INT32 |    100000 |   10000000 |          4 |         0.3 | mark_join |       122462500 |    896x |   1.029 ms | 2.60% |   1.025 ms | 2.62% | 119.500G | 119.500 GB/s |  6.67% |
|        0 | NULLS_UNEQUAL |    INT32 |  10000000 |   10000000 |          4 |         0.3 | mark_join |       242500000 |     49x |  10.381 ms | 0.28% |  10.377 ms | 0.28% |  23.369G |  23.369 GB/s |  1.30% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |       1000 |          4 |         0.3 | mark_join |           24250 |   1568x | 325.285 us | 3.02% | 321.188 us | 3.06% |  75.501M |  75.501 MB/s |  0.00% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |     100000 |          4 |         0.3 | mark_join |         1224625 |   1680x | 303.817 us | 3.49% | 299.728 us | 3.54% |   4.086G |   4.086 GB/s |  0.23% |
|        1 | NULLS_UNEQUAL |    INT32 |    100000 |     100000 |          4 |         0.3 | mark_join |         2425000 |   1536x | 331.583 us | 6.07% | 327.496 us | 6.14% |   7.405G |   7.405 GB/s |  0.41% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |   10000000 |          4 |         0.3 | mark_join |       121262125 |    928x | 800.135 us | 1.45% | 796.040 us | 1.46% | 152.332G | 152.332 GB/s |  8.50% |
|        1 | NULLS_UNEQUAL |    INT32 |    100000 |   10000000 |          4 |         0.3 | mark_join |       122462500 |   2016x | 917.912 us | 1.74% | 913.813 us | 1.75% | 134.013G | 134.013 GB/s |  7.48% |
|        1 | NULLS_UNEQUAL |    INT32 |  10000000 |   10000000 |          4 |         0.3 | mark_join |       242500000 |     88x |   7.442 ms | 0.50% |   7.437 ms | 0.50% |  32.605G |  32.605 GB/s |  1.82% |

## left_semi_join

### [0] NVIDIA RTX PRO 6000 Blackwell Workstation Edition

| Nullable | NullEquality  | DataType | left_size | right_size | num_probes | selectivity | join_type | join_input_size | Samples |  CPU Time  | Noise |  GPU Time  | Noise |  Elem/s  | GlobalMem BW | BWUtil |
|----------|---------------|----------|-----------|------------|------------|-------------|-----------|-----------------|---------|------------|-------|------------|-------|----------|--------------|--------|
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |       1000 |          4 |         0.3 | mark_join |           24250 |   2800x | 242.397 us | 4.55% | 238.314 us | 4.63% | 101.757M | 101.757 MB/s |  0.01% |
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |     100000 |          4 |         0.3 | mark_join |         1224625 |   2496x | 204.967 us | 3.92% | 200.856 us | 4.00% |   6.097G |   6.097 GB/s |  0.34% |
|        0 | NULLS_UNEQUAL |    INT32 |    100000 |     100000 |          4 |         0.3 | mark_join |         2425000 |   2400x | 213.039 us | 2.27% | 208.938 us | 2.31% |  11.606G |  11.606 GB/s |  0.65% |
|        0 | NULLS_UNEQUAL |    INT32 |      1000 |   10000000 |          4 |         0.3 | mark_join |       121262125 |    688x | 733.877 us | 1.69% | 729.858 us | 1.70% | 166.145G | 166.145 GB/s |  9.27% |
|        0 | NULLS_UNEQUAL |    INT32 |    100000 |   10000000 |          4 |         0.3 | mark_join |       122462500 |    560x |   1.018 ms | 1.12% |   1.013 ms | 1.12% | 120.833G | 120.833 GB/s |  6.74% |
|        0 | NULLS_UNEQUAL |    INT32 |  10000000 |   10000000 |          4 |         0.3 | mark_join |       242500000 |     49x |  10.318 ms | 0.18% |  10.314 ms | 0.18% |  23.511G |  23.511 GB/s |  1.31% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |       1000 |          4 |         0.3 | mark_join |           24250 |   2080x | 253.602 us | 6.07% | 249.541 us | 6.18% |  97.178M |  97.178 MB/s |  0.01% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |     100000 |          4 |         0.3 | mark_join |         1224625 |   2336x | 232.993 us | 7.15% | 228.885 us | 7.30% |   5.350G |   5.350 GB/s |  0.30% |
|        1 | NULLS_UNEQUAL |    INT32 |    100000 |     100000 |          4 |         0.3 | mark_join |         2425000 |   2288x | 253.658 us | 3.60% | 249.549 us | 3.65% |   9.718G |   9.718 GB/s |  0.54% |
|        1 | NULLS_UNEQUAL |    INT32 |      1000 |   10000000 |          4 |         0.3 | mark_join |       121262125 |    704x | 718.743 us | 1.37% | 714.639 us | 1.38% | 169.683G | 169.683 GB/s |  9.47% |
|        1 | NULLS_UNEQUAL |    INT32 |    100000 |   10000000 |          4 |         0.3 | mark_join |       122462500 |    608x | 827.506 us | 1.37% | 823.399 us | 1.37% | 148.728G | 148.728 GB/s |  8.30% |
|        1 | NULLS_UNEQUAL |    INT32 |  10000000 |   10000000 |          4 |         0.3 | mark_join |       242500000 |     69x |   7.257 ms | 0.21% |   7.253 ms | 0.21% |  33.437G |  33.437 GB/s |  1.87% |

@PointKernel
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 19b403f into rapidsai:main Apr 1, 2026
110 of 112 checks passed
@PointKernel PointKernel deleted the fix-join-bench-skip branch April 1, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants