[NVCC] Modify template for FP8 type casting to be compilable in C++20 #1771
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
As mentioned by #1770 , we are having some issue with compiling NCCL v2.27.5 using C++20 (it's compilable on C++11 --> 17). We figured out the root-cause is because of this section of device code in
reduce_kernel.h, which does not work in C++20. My understanding is that in C++20, the rules for aggregate initialization and implicit conversions have become stricter.In other words, this line
return toPack(VecB(fromPack<VecA>(a)));relies on the ability to construct VecB from a VecA (or vice versa) via an implicit or aggregate conversion. In C++20, this is not allowed unless there is an explicit constructor or conversion operator.Fix Detail
I attempted to create a localized fix for this particular
reduce_kernel.hcode snippet here by doing explicit and manual element-wise conversion between the types.Update:
Seems like I was being a bit too manual, just need some pushing for compiler to use the correct operator, alternatively can just do
return toPack((VecB)(fromPack<VecA>(a))); \as well --> would similarly invoke the explicit operator https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/struct____nv__fp8x2__e5m2.html#_CPPv4NK15__nv_fp8x2_e5m2cv6float2EvHave compiled successfully on C++14, C++17 and C++20.
Would appreciate some help with verifying whether the fix make sense for this particular case, or if not, what would be the correct fix in order to compile with C++20.