[XLA:GPU][oneAPI] Add support for oneCCL bazel build#42595
[XLA:GPU][oneAPI] Add support for oneCCL bazel build#42595nhatleSummer22 wants to merge 5 commits into
Conversation
|
Hi, thank you for this PR ! I was trying to use it earlier but ran into problem due to oneCCL "one process per GPU" model (according to Codex), the collectives would fail to initialize. Apparently this is only possible with oneAPI 2026.0. Do you have more info? Also, we are maintaining a fork which now ran llama on a b70 at https://github.com/zml/xla/commits/zml/oneapi2/ We found 2 problems:
Thank you! |
|
@MichaelHudgins Can we simply add a directory in third_party or does this need a bit more work on our side? |
Apologies, i missed this one. In general we can, let me message you internally with more specifics. |
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 927245566
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 927245566
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 927245566
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 929784497
Imported from GitHub PR openxla/xla#42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e599233097ee4da17aea66b2ddf844ee160c471 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807a4d7b0b6392fdfa493cdb3270ea50cefa by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072af588c031f96342cf15cf6e65967225660 by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb57429c14c416636f7b5a68ce87f9c895b by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a006fbd38f8eca26e04d0f75b108bb8618 by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 Reverts changelist 929420822 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618 PiperOrigin-RevId: 929784497
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 929784497
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 929784497
Imported from GitHub PR #42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e59923 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807 by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072a by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 FUTURE_COPYBARA_INTEGRATE_REVIEW=#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a PiperOrigin-RevId: 929784497
Imported from GitHub PR openxla/xla#42595 📝 Summary of Changes This PR enables building oneCCL from source using XLA bazel build system. [oneCCL](https://github.com/uxlfoundation/oneCCL) enables optimized communication pattern on Intel's GPUs. 🎯 Justification This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up. 🚀 Kind of Contribution ✨ New Feature Copybara import of the project: -- 8e599233097ee4da17aea66b2ddf844ee160c471 by Nhat Le <nhat.le@intel.com>: Add support for oneCCL bazel build -- 6762807a4d7b0b6392fdfa493cdb3270ea50cefa by Nhat Le <nhat.le@intel.com>: Add dependency to trigger oneCCL build -- 644072af588c031f96342cf15cf6e65967225660 by Nhat Le <nhat.le@intel.com>: Fix EOF -- 42901eb57429c14c416636f7b5a68ce87f9c895b by Nhat Le <nhat.le@intel.com>: Make all the necessary headers visible to dependents -- f4c481a006fbd38f8eca26e04d0f75b108bb8618 by nhatle <nhat.le@intel.com>: Fix for file not found errors in XLA Linux X86 GPU ONEAPI CI Merging this change closes #42595 Reverts 028c67b FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618 PiperOrigin-RevId: 929784497
|
Hi @nhatleSummer22 . We are merging this without the change to |
Reverts 028c67b FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618 PiperOrigin-RevId: 930185853
To add support for other operations like all-gather/collective dots to the collective kernel thunk we should make it agnostic to custom kernels first. Reverts 028c67b FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#42595 from Intel-tensorflow:nhatle/xla_oneccl_bazel_build f4c481a006fbd38f8eca26e04d0f75b108bb8618 PiperOrigin-RevId: 930464574
📝 Summary of Changes
This PR enables building oneCCL from source using XLA bazel build system. oneCCL enables optimized communication pattern on Intel's GPUs.
🎯 Justification
This PR is first step to support scale-up functionality on Intel's GPUs. Subsequent PRs will add full support for scale-up.
🚀 Kind of Contribution
✨ New Feature