Skip to content

Conversation

@jushg
Copy link

@jushg jushg commented Jul 8, 2025

Context

As mentioned by #1770 , we are having some issue with compiling NCCL v2.27.5 using C++20 (it's compilable on C++11 --> 17). We figured out the root-cause is because of this section of device code in reduce_kernel.h, which does not work in C++20. My understanding is that in C++20, the rules for aggregate initialization and implicit conversions have become stricter.

In other words, this line return toPack(VecB(fromPack<VecA>(a))); relies on the ability to construct VecB from a VecA (or vice versa) via an implicit or aggregate conversion. In C++20, this is not allowed unless there is an explicit constructor or conversion operator.

Fix Detail

I attempted to create a localized fix for this particular reduce_kernel.h code snippet here by doing explicit and manual element-wise conversion between the types.

Update:

Seems like I was being a bit too manual, just need some pushing for compiler to use the correct operator, alternatively can just do return toPack((VecB)(fromPack<VecA>(a))); \ as well --> would similarly invoke the explicit operator https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/struct____nv__fp8x2__e5m2.html#_CPPv4NK15__nv_fp8x2_e5m2cv6float2Ev

Have compiled successfully on C++14, C++17 and C++20.

Would appreciate some help with verifying whether the fix make sense for this particular case, or if not, what would be the correct fix in order to compile with C++20.

jushg added 2 commits July 8, 2025 11:05
Change the casting template for FP8 types to be compatible with c++20
Turn out need much less change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant