Skip to content

[ENHANCEMENT]: Use static_for to guarantee compile-time loop unrolling #770

@PointKernel

Description

@PointKernel

Is your feature request related to a problem? Please describe.

Currently, cuco relies on #pragma unroll to hint the compiler for more efficient code generation in fixed-size for loops, for example:

#pragma unroll bucket_size
for (cuda::std::int32_t i = 0; i < bucket_size; ++i) {

#pragma unroll
for (cuda::std::int32_t i = 0; i < storage_ref_.metadata().num_containers; i++) {

Describe the solution you'd like

We should leverage the new static_for in cuco to improve loops that currently rely on #pragma unroll. CCCL introduced static_for to support compile-time static loops: link. Unlike a simple pragma, it can guarantee that the loop is fully unrolled. We should apply static_for wherever the loop size is known at compile time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: improvementImprovement / enhancement to an existing function

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions