[ENHANCEMENT]: Implement cooperative device bulk operations


## Pitch 
Device bulk operations such as 
```cpp
template <class CG, class InputIt>
__device__ void insert(CG const& group, InputIt first, InputIt last)
```
, i.e., assigning a cooperative group to process `N` input elements cooperatively, allow for adaptive and optimized work distribution.

## Background
Our current device CG APIs look as follows:
```cpp
template <class CG, class Key>
__device__ void insert(CG const& group, Key key)
```
, i.e., they take a single input datum and cooperatively insert it into the container. Each thread in the CG is expected to get the _same_ key as input.

The problem with this approach is that it leads to redundant computation: 
Each thread in the CG needs to compute the hash value of the (exact same) input key, before proceeding to the probing/insertion part. This imposes unnecessary pressure on the compute pipelines, which reduces overall throughput mainly in cases where the kernel is compute bound (which likely happens if the data structure fits (partially) into the L2$). However, experiments show that even for large data structures (>2GB) the performance improvement is still visible (>5%).

The bulk device operation eliminates this drawback as follows: 
Each thread loads one input key at a time, resulting in a coalesced memory acces pattern, and then computes its respective hash value. From here we go into a cooperative loop, iterating over all ranks in the group. At each iteration, we shuffle the computed hash value from the target rank to all threads in the group (broadcast). From here we can proceed with the standard cooperative insertion/probing approach, as each thread holds the same hash value.

Another benefit is in the API: 
The user may give us any CG size as input, which we can then adaptively subdivide into multiple worker groups that insert the given range of input keys as efficient as possible. In that sense, this API becomes adaptive in relation to the given input CG size.

One example of this approach is given in #672 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENHANCEMENT]: Implement cooperative device bulk operations #674

Pitch

Background

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT]: Implement cooperative device bulk operations #674

Description

Pitch

Background

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions