-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Labels
type: improvementImprovement / enhancement to an existing functionImprovement / enhancement to an existing function
Description
Is your feature request related to a problem? Please describe.
Currently, cuco relies on #pragma unroll to hint the compiler for more efficient code generation in fixed-size for loops, for example:
cuCollections/include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh
Lines 980 to 981 in 3b9873a
| #pragma unroll bucket_size | |
| for (cuda::std::int32_t i = 0; i < bucket_size; ++i) { |
cuCollections/include/cuco/detail/roaring_bitmap/roaring_bitmap_impl.cuh
Lines 119 to 120 in 3b9873a
| #pragma unroll | |
| for (cuda::std::int32_t i = 0; i < storage_ref_.metadata().num_containers; i++) { |
Describe the solution you'd like
We should leverage the new static_for in cuco to improve loops that currently rely on #pragma unroll. CCCL introduced static_for to support compile-time static loops: link. Unlike a simple pragma, it can guarantee that the loop is fully unrolled. We should apply static_for wherever the loop size is known at compile time.
Metadata
Metadata
Assignees
Labels
type: improvementImprovement / enhancement to an existing functionImprovement / enhancement to an existing function