add xpu build support #185

sywangyi · 2025-07-24T06:14:03Z

@danieldk how about adding xpu support? I also add example of relu.

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi · 2025-07-24T07:27:30Z

I have only verified in local build

danieldk · 2025-07-25T11:27:29Z

Nice!!! 🎉 I will try to have a look today or Monday.

sywangyi · 2025-07-28T05:05:09Z

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

danieldk

Thanks again, this is really great to have! Added one small comment.

build2cmake/src/templates/xpu/kernel.cmake

danieldk · 2025-07-28T09:44:27Z

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

Yes, indeed. Let me know if you need any help. If you want, I can also have a go at adding XPU support to the Torch Nix derivation.

Signed-off-by: Wang, Yi A <[email protected]>

IlyasMoutawwakil · 2025-07-29T16:28:48Z

@danieldk should we wait for the support in hf-nix / torch nix ?

sywangyi · 2025-07-30T06:36:13Z

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

Yes, indeed. Let me know if you need any help. If you want, I can also have a go at adding XPU support to the Torch Nix derivation.

yes. I try to enable torch xpu in hf-nix. but found some issues.

how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.
some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.

do you have some suggestion to solve it? Thanks very much

sywangyi · 2025-07-30T06:39:58Z

@yao-matrix

danieldk · 2025-07-30T09:38:46Z

how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.

Yeah, everything needs to be hashed and accessible in the build sandbox, otherwise the builds are not reproducible. We also need to be able to cache build artifacts to avoid that downstream users have to rebuild everything.

I am not very familiar with oneAPI, but aren't Ubuntu/RHEL packages available without accepting an EULA?

https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/install-using-package-managers.html

Not sure about the licensing terms though.

some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.

torch-xpu-ops: you can use fetchFromGitHub and then move it to ${TORCH_ROOT}/third_party/torch-xpu-ops in the post-unpack phase.
oneDNN: use fetchFromGitHub and point XPU_MKLDNN_DIR_PREFIX to it.

sywangyi · 2025-07-30T09:47:04Z

how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.

Yeah, everything needs to be hashed and accessible in the build sandbox, otherwise the builds are not reproducible.

I am not very familiar with oneAPI, but aren't Ubuntu/RHEL packages available without accepting an EULA?

https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/install-using-package-managers.html

Not sure about the licensing terms though.

some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.

torch-xpu-ops: you can use fetchFromGitHub and then move it to ${TORCH_ROOT}/third_party/torch-xpu-ops in the post-unpack phase.

oneDNN: use fetchFromGitHub and point XPU_MKLDNN_DIR_PREFIX to it.

download key and add signed entry is needed

download the key to system keyring

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

add signed entry to apt sources and configure the APT client to use Intel repository:

echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
see https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/apt.html#GUID-2CAA766C-0CBA-4056-BD4B-264B6A875854

danieldk · 2025-07-30T09:50:35Z

But that's only required because APT will not install anything from a repository if it cannot verify the metadata GPG signatures. Importing the GPG keys does not require accepting the EULA.

We should still check if the oneAPI EULA permits our use though -- Nix has to patch binaries (for fixing library paths, etc.), and we have to be able to provide outputs through our binary cache to avoid that users have to rebuild everything.

sywangyi · 2025-07-30T10:13:37Z

But that's only required because APT will not install anything from a repository if it cannot verify the metadata GPG signatures. Importing the GPG keys does not require accepting the EULA.

We should still check if the oneAPI EULA permits our use though -- Nix has to patch binaries (for fixing library paths, etc.), and we have to be able to provide outputs through our binary cache to avoid that users have to rebuild everything.

maybe could I follow the guide https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html to installed it in nix? see section of "Install through a Command Line" must indicate "--eula accept"

sywangyi · 2025-07-31T01:25:36Z

This is the benefit and constrains of the offline installation
Key Benefits
Offline installation: No network access needed during build
Reproducible: Same hash = same result every time
Complete oneAPI: You get the full toolkit, not just vendored files
Self-contained: Everything is in the Nix store

Considerations
Large download: The oneAPI toolkit is several GB
Build time: Installation takes significant time
License: Must accept Intel's EULA (unfree package)
System resources: Requires substantial disk space

I see there's "allowUnfree = true" for cuda and rocm, so shall I implement it in the same way?
@danieldk

danieldk · 2025-07-31T09:29:50Z

I see there's "allowUnfree = true" for cuda and rocm, so shall I implement it in the same way? @danieldk

Note though that allowUnfree = true is just a signal to nixpkgs that unfree packages can be built because they do not want proprietary packages in their cache. There is no explicit confirmation of the EULA by the user. The user can just drop a flake.nix into their repo and at no point they have to agree to an EULA to build CUDA or ROCm kernels.

If it is necessary to get every user who builds a kernel to accept the oneAPI EULA (since they are by extension downloading oneAPI), than marking oneAPI packages as unfree does not add anything, it's just metadata, but not metadata the user will ever see.

This is the benefit and constrains of the offline installation

I think the best approach would be to automatically generate Nix derivations from RHEL/RPM packages and then overlay package-specific changes like we do for ROCm (and nixpkgs does for CUDA):

https://github.com/huggingface/hf-nix/blob/main/pkgs/rocm-packages/default.nix

Using a single installer/tarball makes build times really large and does not compose well. It makes it hard to do fixes on top, whereas with the package-based approach, we can easily layer package-specific extensions:

https://github.com/huggingface/hf-nix/blob/main/pkgs/rocm-packages/overrides.nix

Most likely there needs to be some compiler/stdenv wrapping, etc., which is much easier with separate packages/derivations.

It also avoids that the dependency closures become dependent on all of oneAPI, which will probably be unnecessarily large (for ROCm/CUDA we only end up with a small subset of the toolkits).

danieldk · 2025-07-31T09:39:55Z

Thinking more about it, I am not sure who should accept the EULA. Us as a party that would redistribute oneAPI through our binary cache? Or kernel developers that use kernel-builder? Or both? I think the implementation of EULA acceptance depends on who we need to accept the EULA.

If it is the kernel developer who should accept it, we could consider adding an attribute to genFlakeOutputs to explicitly accept the EULA. Though it potentially raises new questions: what if we want to build kernels automatically in CI (which we might want to do in the future)? What if other kernel developers contribute to the kernel, since the genFlakeOutputs attribute is already set at that point, they will never accept the EULA, etc.

Also make code match CUDA/Metal more closely.

danieldk · 2025-07-31T19:34:09Z

Merged this PR, many thanks for adding XPU support to build2cmake. 🎉

sywangyi · 2025-08-01T00:36:27Z

Thanks for all the suggestion, I will have a try if I could automatically generate Nix derivations from RHEL/RPM packages and then overlay package-specific changes like what you have done for ROCm.

add xpu build support

fb8b943

Signed-off-by: Wang, Yi A <[email protected]>

danieldk reviewed Jul 28, 2025

View reviewed changes

build2cmake/src/templates/xpu/kernel.cmake Outdated Show resolved Hide resolved

use sysl_flags

1fca9ca

Signed-off-by: Wang, Yi A <[email protected]>

danieldk added 2 commits July 31, 2025 14:04

Fix compiler warnings and clippy warnings

b892169

Also make code match CUDA/Metal more closely.

Metal: filter metal kernels

94284c8

danieldk approved these changes Jul 31, 2025

View reviewed changes

danieldk merged commit 1f403d7 into huggingface:main Jul 31, 2025
9 checks passed

add xpu build support #185

add xpu build support #185

Uh oh!

Conversation

sywangyi commented Jul 24, 2025

Uh oh!

sywangyi commented Jul 24, 2025

Uh oh!

danieldk commented Jul 25, 2025

Uh oh!

sywangyi commented Jul 28, 2025

Uh oh!

danieldk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danieldk commented Jul 28, 2025

Uh oh!

IlyasMoutawwakil commented Jul 29, 2025

Uh oh!

sywangyi commented Jul 30, 2025

Uh oh!

sywangyi commented Jul 30, 2025

Uh oh!

danieldk commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

download the key to system keyring

add signed entry to apt sources and configure the APT client to use Intel repository:

Uh oh!

danieldk commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danieldk commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danieldk commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

danieldk commented Jul 31, 2025

Uh oh!

sywangyi commented Aug 1, 2025

Uh oh!

Uh oh!

danieldk commented Jul 30, 2025 •

edited

Loading

sywangyi commented Jul 30, 2025 •

edited

Loading

danieldk commented Jul 30, 2025 •

edited

Loading

sywangyi commented Jul 30, 2025 •

edited

Loading

sywangyi commented Jul 31, 2025 •

edited

Loading

danieldk commented Jul 31, 2025 •

edited

Loading

danieldk commented Jul 31, 2025 •

edited

Loading