Skip to content

[ENHANCEMENT]: __device__ for_each that provides a mutable reference rather than a copy of the slot to the callback function #639

@sleeepyjack

Description

@sleeepyjack

Is your feature request related to a problem? Please describe.

Feature request on behalf of @LutzCle who is implementing a mark join for an internal project.

tl;dr The algorithm requires setting a "(matching) slot has been seen" bit during probing which is explicitly excluded from the hash value generation and key comparison, so it doesn't break the probing sequence if set or unset.

Our initial approach was to use the for_each device API and set the bit for each matching element but unfortunately this doesn't work since the current for_each implementation returns a copy of the matching slot rather than a mutable reference.

Describe the solution you'd like

Provide both copy and mutable reference options for for_each.

Naming ideas: for_each(_reference)/for_each_copy/for_each_mutable, ...

Describe alternatives you've considered

The current workaround consists of basically re-implementing for_each with the proposed feature in a custom kernel which involves using some of the internal cuco::detail machinery - not ideal.

Additional context

I'm not sure if this feature is a foot gun or not so I would like to spark a discussion around the potential dangers.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions