-
Notifications
You must be signed in to change notification settings - Fork 294
Simplify cuda::host_launch API
#6689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c313a52 to
8b64eb2
Compare
This comment has been minimized.
This comment has been minimized.
| //! @param __stream Stream to launch the host function on | ||
| //! @param __callable A reference to a host function or callable object to call in stream order | ||
| template <class _Callable> | ||
| _CCCL_HOST_API void host_launch(stream_ref __stream, ::cuda::std::reference_wrapper<_Callable> __callable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep the separate overload, it's easier to document this mode. Otherwise with one overload you need to describe the set of conditions to avoid the allocation, where here you have them expressed in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree, actually I think that it makes everything much easier. We would have a single function. We can simply document the behaviour without talking about memory allocations, pointing out the option to use cuda::std::reference_wrapper for cases when the user wants to pass a reference to a callable or an argument.
Then, we usually have Performance Considerations section where we would describe that if there are no parameters passed to the function and the function is either a free function or a cuda::std::reference_wrapper we use cuLaunchHostFunc without memory allocations and cuStreamAddCallback otherwise.
I think this makes everything much cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People usually just glance the documentation and they will immediately notice two overloads, a very small subset will read the performance considerations section.
I actually think the overload you are removing is more important than the other one and should be used more often, that's why I want it to be as visible as possible.
| // We use the callback here to have it execute even on stream error, because it needs to free the above allocation | ||
| ::cuda::__driver::__streamAddCallback(__stream.get(), __stream_callback_launcher<_CallbackData>, __callback_data_ptr); | ||
| } | ||
| if constexpr (!__has_args && ::cuda::std::is_function_v<_Callable> && ::cuda::std::is_pointer_v<_Callable>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this
🥳 CI Workflow Results🟩 Finished in 1h 34m: Pass: 100%/90 | Total: 12h 57m | Max: 53m 25s | Hits: 99%/213937See results here. |
Previously, we had a special overload for cases when the user passed
cuda::std::reference_wrapperas the callable without any arguments.This PR removes this overload and handles it inside the generic implementation. In addition, also functions returning
voidwithout arguments are launched usingcuLaunchHostFuncwhich doesn't require memory allocation.