-
Notifications
You must be signed in to change notification settings - Fork 294
Closed
Labels
feature requestNew feature or request.New feature or request.
Description
MVP
- Implement
cuda::std::for_each()and call it ascuda::std::for_each( *no execution policy* )- Serial implementation, works in both host and device code
cuda::std::for_each(cuda::std::execution::seq, ...)- Equivalent to above
- Serial implementation works in both host and device code
cuda::std::for_each( cuda::__cub_unseq_par, )- Synchronous. Runs in parallel on the GPU via
cub::DeviceForEachon the default stream - Because this is synchronous, this will only work in host code.
cuda::std::for_each(cuda::__cub_unseq_par,)should ideally fail to compile when used in device code
cuda::__cub_unseq_paris an internal only execution policy to avoid bikeshedding on whatcuda::std::execution::par_unseqshould mean (CPU vs GPU)- Longer term, would like to be able to pass environments as execution policy arguments to allow specifying parallelism and execution place separately since they are technically orthogonal
- Synchronous. Runs in parallel on the GPU via
Benchmarks
- Simple benchmark for
__half, int, doubleto compare withcub::DeviceForEachbenchmarks and ensure performance parity sincecub::DeviceForEachis already extensively benchmarked, we are not trying to recreate all of those forcuda::std::for_each
Testing
- Simple
lit-style functional testing. - Simple catch2 functional tests
Non-goals
- Asynchrony/streams
- We will postpone benchmarks until we have support for memory resource passing. Currently the benchmark facilities rely on caching allocators to avoid having the memory subsystem interfere with the measurements. That would currently fail and significantly affect benchmarks
Metadata
Metadata
Assignees
Labels
feature requestNew feature or request.New feature or request.
Type
Projects
Status
Done