-
Notifications
You must be signed in to change notification settings - Fork 294
Implement cuda::sincos
#6742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement cuda::sincos
#6742
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@s-oboyle I've applied the function to even more places, could you confirm it is fine? |
🥳 CI Workflow Results🟩 Finished in 1h 36m: Pass: 100%/90 | Total: 1d 02h | Max: 1h 18m | Hits: 96%/201620See results here. |
| template <class Integral> | ||
| [[nodiscard]] __host__ __device__ | ||
| sincos_result<double> sincos(Integral value) noexcept; // (A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using 1 and A looks a typo
| sincos_result<double> sincos(Integral value) noexcept; // (A) | |
| sincos_result<double> sincos(Integral value) noexcept; // (2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, it's how they define these overloads on cppreference. Will (2) be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, using 1. and A. is not very clear. I never noted in cppreference honestly
|
|
||
| **Performance considerations** | ||
|
|
||
| - If available, the functionality is implemented by compiler builtins, otherwise fallbacks to ``cuda::std::sin(value)`` and ``cuda::std::cos(value)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can be more precise about the types where the optimization applies
| auto neg_result = cuda::sincos(-cuda::std::numeric_limits<T>::quiet_NaN()); | ||
| assert(cuda::std::isnan(neg_result.sin)); | ||
| assert(cuda::std::isnan(neg_result.cos)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add another test case with a generic values, e.g. 0.3 and test if it matches (or approx) separate calls to cos/sin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahah, for sure we can, I wanted not to deal with the floating point comparisons and just test whether it works 😅
| __ret.sin = ::cuda::std::sin(__v); | ||
| __ret.cos = ::cuda::std::cos(__v); | ||
| return __ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if all compilers like it
| __ret.sin = ::cuda::std::sin(__v); | |
| __ret.cos = ::cuda::std::cos(__v); | |
| return __ret; | |
| return {::cuda::std::sin(__v), ::cuda::std::cos(__v)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an improvement, the return structure is defined anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is an improvement in the sense the code is shorter 😅. Anyway, up to you
If once wants to compute sin and cos of a number, computing both operations at the same time allows some optimizations.
This PR introduces
cuda::sincosthat tries to use compiler builtins and device intrinsics to provide this functionality and fallbacks to separate sin and cos calculation.