Skip to content

Conversation

@kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Mar 12, 2025

Partially addresses #606

This PR kick-starts the process of adding benchmarks to KvikIO. The following tasks are done:

  • Add CMake scripts for the benchmark.
  • Add a simple benchmark for the threadpool.

To build and run the benchmark programs in the dev container:

# Build the benchmarks
build-kvikio-cpp -DBUILD_TESTS=ON -DBUILD_BENCHMARKS=ON -j 16

# Run specific benchmark
~/kvikio/cpp/build/latest/benchmarks/<benchmark-name>

Sample output:

2025-03-14T04:38:46+00:00
Running ./THREADPOOL_BENCHMARK
Run on (16 X 5050 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 98304 KiB (x1)
Load Average: 0.34, 1.07, 1.22
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------
BM_threadpool_compute:strong_scaling/1/min_time:2.000/real_time         162 ms        0.728 ms           17 threads=1
BM_threadpool_compute:strong_scaling/2/min_time:2.000/real_time        81.6 ms        0.885 ms           34 threads=2
BM_threadpool_compute:strong_scaling/4/min_time:2.000/real_time        41.1 ms        0.972 ms           68 threads=4
BM_threadpool_compute:strong_scaling/8/min_time:2.000/real_time        21.3 ms         1.40 ms          127 threads=8
BM_threadpool_compute:strong_scaling/16/min_time:2.000/real_time       18.7 ms         1.84 ms          150 threads=16
BM_threadpool_compute:strong_scaling/32/min_time:2.000/real_time       19.2 ms         2.76 ms          145 threads=32
BM_threadpool_compute:strong_scaling/64/min_time:2.000/real_time       20.8 ms         5.46 ms          139 threads=64
BM_threadpool_compute:weak_scaling/1/min_time:2.000/real_time          16.2 ms        0.135 ms          172 threads=1
BM_threadpool_compute:weak_scaling/2/min_time:2.000/real_time          16.3 ms        0.260 ms          171 threads=2
BM_threadpool_compute:weak_scaling/4/min_time:2.000/real_time          16.6 ms        0.527 ms          168 threads=4
BM_threadpool_compute:weak_scaling/8/min_time:2.000/real_time          17.2 ms         1.05 ms          164 threads=8
BM_threadpool_compute:weak_scaling/16/min_time:2.000/real_time         29.7 ms         2.66 ms           94 threads=16
BM_threadpool_compute:weak_scaling/32/min_time:2.000/real_time         60.9 ms         8.04 ms           46 threads=32
BM_threadpool_compute:weak_scaling/64/min_time:2.000/real_time          133 ms         36.0 ms           21 threads=64

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 12, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu added feature request New feature or request non-breaking Introduces a non-breaking change c++ Affects the C++ API of KvikIO labels Mar 12, 2025
@kingcrimsontianyu kingcrimsontianyu changed the title Add C++ benchmarks Add C++ benchmarks (part 1/n) Mar 13, 2025
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review March 13, 2025 21:26
@kingcrimsontianyu kingcrimsontianyu requested review from a team as code owners March 13, 2025 21:26
Copy link
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small CMake issues that should be fixed.

@GregoryKimball
Copy link

Thanks @kingcrimsontianyu this is a great start!

I think the "Time" column makes sense to me, showing an optimum of 8 threads in both the weak and strong scaling cases. The "CPU" column doesn't make much sense to me though.

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would appreciate some comments in threadpool_benchmark.cpp

@kingcrimsontianyu
Copy link
Contributor Author

@GregoryKimball Time refers to the wall-clock time, and CPU is the CPU time of the main thread (implemented using CLOCK_THREAD_CPUTIME_ID from clock_gettime). For this benchmark, CPU is a measure of the overhead of thread pool management that takes place on the main thread. I think this will help us find how well the BS pool implementation would scale on a many-core system (Grace), where the use of single task queue with mutexes and condition variables might turn out suboptimal.

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@kingcrimsontianyu
Copy link
Contributor Author

To avoid confusion, explanation of the metrics is added to the result in another ongoing PR (#664).

Screenshot from 2025-03-14 11-01-10

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving C++ changes

@kingcrimsontianyu
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f5268f9 into rapidsai:branch-25.04 Mar 14, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Affects the C++ API of KvikIO feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants