Skip to content

Conversation

@fbusato
Copy link
Contributor

@fbusato fbusato commented Nov 22, 2025

Motivations

Modern GPU architectures are increasingly exposing fine-grained, single-thread SIMD capabilities to maximize throughput within individual CUDA threads. While GPU programming model strongly focuses on the SIMT model, newer hardware relies on specialized SIMD operations to saturate execution units. Some examples include:

C++26 std::simd provides a standardized abstraction to write vectorized code. This is a great opportunity to unify customized code to handle all variants and reduce CUDA software fragmentation. By adopting std::simd-like API, developers can write a single vectorized kernel that compiles to the optimal instructions for any GPU architecture.

PR Goals and Non-Goals

The PR aims to provide a basic implementation of std::simd and provide the foundation for future optimizations and extensions.

Advanced math and bit operations, e.g. std::abs , std::pow, std::popcount etc. , as well as std::complex binding, are outside the scope of the first PR.

Non-Goals:

  • Fully-implement std::simd.
  • Implement custom ABIs to target host vector instructions.

Implementation Notes

The implementation is based on the LLVM code experimental/__simd and extended to support the related C++ proposals:

Some optimizations are already exploited in the CCCL code, for example thread_simd.h and thread_reduce.h. They will gradually added to the implementation.

Partially address #30

@fbusato fbusato self-assigned this Nov 22, 2025
@fbusato fbusato requested a review from a team as a code owner November 22, 2025 01:47
@fbusato fbusato added the 3.2.0 Targeted for 3.2.0 release label Nov 22, 2025
@fbusato fbusato added this to CCCL Nov 22, 2025
@fbusato fbusato requested a review from ericniebler November 22, 2025 01:47
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 22, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 22, 2025
@github-actions

This comment has been minimized.

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using SIMD on the host, is there any reason for that?

@fbusato
Copy link
Contributor Author

fbusato commented Nov 24, 2025

because this is the first PR. Secondly, because we care more about GPU than CPU. Third, the feature is also experimental for other std libraries.

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 15m 27s: Pass: 100%/42 | Total: 2h 47m | Max: 14m 49s | Hits: 99%/20431

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.2.0 Targeted for 3.2.0 release

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants