Is your feature request related to a problem? Please describe.
Feature request on behalf of @LutzCle who is implementing a mark join for an internal project.
tl;dr The algorithm requires setting a "(matching) slot has been seen" bit during probing which is explicitly excluded from the hash value generation and key comparison, so it doesn't break the probing sequence if set or unset.
Our initial approach was to use the for_each device API and set the bit for each matching element but unfortunately this doesn't work since the current for_each implementation returns a copy of the matching slot rather than a mutable reference.
Describe the solution you'd like
Provide both copy and mutable reference options for for_each.
Naming ideas: for_each(_reference)/for_each_copy/for_each_mutable, ...
Describe alternatives you've considered
The current workaround consists of basically re-implementing for_each with the proposed feature in a custom kernel which involves using some of the internal cuco::detail machinery - not ideal.
Additional context
I'm not sure if this feature is a foot gun or not so I would like to spark a discussion around the potential dangers.