Skip to content

[XLA:GPU][oneAPI] Add support for oneCCL bazel build#42595

Closed
nhatleSummer22 wants to merge 5 commits into
openxla:mainfrom
Intel-tensorflow:nhatle/xla_oneccl_bazel_build
Closed

[XLA:GPU][oneAPI] Add support for oneCCL bazel build#42595
nhatleSummer22 wants to merge 5 commits into
openxla:mainfrom
Intel-tensorflow:nhatle/xla_oneccl_bazel_build

Conversation

@nhatleSummer22

@nhatleSummer22 nhatleSummer22 commented May 14, 2026

Copy link
Copy Markdown
Contributor

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. oneCCL enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature

@steeve

steeve commented May 14, 2026

Copy link
Copy Markdown
Contributor

Hi, thank you for this PR !

I was trying to use it earlier but ran into problem due to oneCCL "one process per GPU" model (according to Codex), the collectives would fail to initialize.

Apparently this is only possible with oneAPI 2026.0. Do you have more info?

Also, we are maintaining a fork which now ran llama on a b70 at https://github.com/zml/xla/commits/zml/oneapi2/

We found 2 problems:

  • implementing sycl blas gemm
  • fix event recording which was broken

Thank you!

@dimitar-asenov

Copy link
Copy Markdown
Member

@MichaelHudgins Can we simply add a directory in third_party or does this need a bit more work on our side?

@bhavani-subramanian

Copy link
Copy Markdown
Contributor

@steeve Thanks for the note. Just a heads-up that I have opened a PR to fix the issue in event recording: #42806

@dimitar-asenov dimitar-asenov requested a review from penpornk May 19, 2026 08:53
@MichaelHudgins

Copy link
Copy Markdown
Member

@MichaelHudgins Can we simply add a directory in third_party or does this need a bit more work on our side?

Apologies, i missed this one. In general we can, let me message you internally with more specifics.

neudinger added a commit to zml/xla that referenced this pull request Jun 9, 2026
copybara-service Bot pushed a commit that referenced this pull request Jun 9, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 927245566
copybara-service Bot pushed a commit that referenced this pull request Jun 9, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 927245566
copybara-service Bot pushed a commit that referenced this pull request Jun 9, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 927245566
copybara-service Bot pushed a commit that referenced this pull request Jun 10, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 929784497
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 10, 2026
Imported from GitHub PR openxla/xla#42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e599233097ee4da17aea66b2ddf844ee160c471 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807a4d7b0b6392fdfa493cdb3270ea50cefa by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072af588c031f96342cf15cf6e65967225660 by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb57429c14c416636f7b5a68ce87f9c895b by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a006fbd38f8eca26e04d0f75b108bb8618 by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

Reverts changelist 929420822

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618
PiperOrigin-RevId: 929784497
copybara-service Bot pushed a commit that referenced this pull request Jun 10, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 929784497
copybara-service Bot pushed a commit that referenced this pull request Jun 11, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 929784497
copybara-service Bot pushed a commit that referenced this pull request Jun 11, 2026
Imported from GitHub PR #42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e59923 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807 by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072a by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a
PiperOrigin-RevId: 929784497
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 11, 2026
Imported from GitHub PR openxla/xla#42595

📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.

🚀 Kind of Contribution
✨ New Feature
Copybara import of the project:

--
8e599233097ee4da17aea66b2ddf844ee160c471 by Nhat Le <nhat.le@intel.com>:

Add support for oneCCL bazel build

--
6762807a4d7b0b6392fdfa493cdb3270ea50cefa by Nhat Le <nhat.le@intel.com>:

Add dependency to trigger oneCCL build

--
644072af588c031f96342cf15cf6e65967225660 by Nhat Le <nhat.le@intel.com>:

Fix EOF

--
42901eb57429c14c416636f7b5a68ce87f9c895b by Nhat Le <nhat.le@intel.com>:

Make all the necessary headers visible to dependents

--
f4c481a006fbd38f8eca26e04d0f75b108bb8618 by nhatle <nhat.le@intel.com>:

Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI

Merging this change closes #42595

Reverts 028c67b

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618
PiperOrigin-RevId: 929784497
@dimitar-asenov

Copy link
Copy Markdown
Member

Hi @nhatleSummer22 . We are merging this without the change to oneccl_collectives.cc. Hope that's OK.

copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 11, 2026
Reverts 028c67b

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618
PiperOrigin-RevId: 930185853
copybara-service Bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 11, 2026
To add support for other operations like all-gather/collective dots to
the collective kernel thunk we should make it agnostic to custom kernels
first.

Reverts 028c67b

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618
PiperOrigin-RevId: 930464574
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants