[wip] Support premul_sum #1948

Chao1Han · 2025-08-21T07:54:30Z

refer https://github.com/intel-innersource/libraries.performance.communication.oneccl/pull/3449
integrate premul_sum to xccl

Copilot

Pull Request Overview

This PR adds support for premultiplied sum operations (premul_sum) in the XCCL distributed communication backend. The implementation includes version checking to ensure compatibility with oneCCL >= 2021.17, which is required for this feature.

Adds XCCLPreMulSumSupplement struct to handle premul sum factors (both scalar and tensor)
Implements makeXCCLPreMulSum template function to create reduce operations with factors
Updates getXcclReduceOp function signature and adds PREMUL_SUM case handling
Adds comprehensive test coverage for premul sum operations in reduce and reduce_scatter scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
test/xpu/distributed/test_c10d_ops_xccl.py	Fixes typos in class names and adds test cases for premul_sum operations
src/xccl/ProcessGroupXCCL.hpp	Adds XCCLPreMulSumSupplement struct and makeXCCLPreMulSum template function
src/xccl/ProcessGroupXCCL.cpp	Implements premul_sum support with version checking and updates function signatures

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-21T07:55:17Z

src/xccl/ProcessGroupXCCL.cpp

+        case ccl::datatype::float32:
+          return unpackPreMulSum<float, ccl::datatype::float32>(reduceOp, comm);
+        case ccl::datatype::bfloat16:
+          return unpackPreMulSum<float, ccl::datatype::bfloat16>(


For bfloat16 data type, the template should use at::BFloat16 instead of float. Using float for bfloat16 data will cause type mismatch issues when accessing the tensor data.

Suggested change

return unpackPreMulSum<float, ccl::datatype::bfloat16>(

return unpackPreMulSum<at::BFloat16, ccl::datatype::bfloat16>(

Copilot · 2025-08-21T07:55:18Z

src/xccl/ProcessGroupXCCL.cpp

+      ccl::reduction_destroy(op_, *comm_);
+    }
+  }
+#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT


The destructor is only defined when ENABLE_XCCL_PREMUL_SUM_SUPPORT is defined, but the class is used regardless of this macro. This will cause linking errors when the macro is not defined. The destructor should be defined unconditionally with appropriate conditional logic inside.

Suggested change

#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT

~xcclRedOpRAII() {

#if defined(ENABLE_XCCL_PREMUL_SUM_SUPPORT)

if (premul_sum_) {

ccl::reduction_destroy(op_, *comm_);

}

#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT

}

Support premul_sum

0d38985

Copilot AI review requested due to automatic review settings August 21, 2025 07:54

Copilot AI reviewed Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] Support premul_sum #1948

[wip] Support premul_sum #1948

Uh oh!

Chao1Han commented Aug 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

Uh oh!

	return unpackPreMulSum<float, ccl::datatype::bfloat16>(
	return unpackPreMulSum<at::BFloat16, ccl::datatype::bfloat16>(

[wip] Support premul_sum #1948

Are you sure you want to change the base?

[wip] Support premul_sum #1948

Uh oh!

Conversation

Chao1Han commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Chao1Han commented Aug 21, 2025 •

edited

Loading