Skip to content

add xpu build support #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 31, 2025
Merged

add xpu build support #185

merged 4 commits into from
Jul 31, 2025

Conversation

sywangyi
Copy link
Contributor

@danieldk how about adding xpu support? I also add example of relu.

Signed-off-by: Wang, Yi A <[email protected]>
@sywangyi
Copy link
Contributor Author

I have only verified in local build

@danieldk
Copy link
Member

Nice!!! 🎉 I will try to have a look today or Monday.

@sywangyi
Copy link
Contributor Author

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, this is really great to have! Added one small comment.

@danieldk
Copy link
Member

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

Yes, indeed. Let me know if you need any help. If you want, I can also have a go at adding XPU support to the Torch Nix derivation.

Signed-off-by: Wang, Yi A <[email protected]>
@IlyasMoutawwakil
Copy link
Member

@danieldk should we wait for the support in hf-nix / torch nix ?

@sywangyi
Copy link
Contributor Author

@danieldk if we need to enable nix build. Do we need to add "xpu" support in https://github.com/huggingface/hf-nix, since we need torch with "xpu" support?

Yes, indeed. Let me know if you need any help. If you want, I can also have a go at adding XPU support to the Torch Nix derivation.

yes. I try to enable torch xpu in hf-nix. but found some issues.

  1. how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
    Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.

  2. some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.

do you have some suggestion to solve it? Thanks very much

@sywangyi
Copy link
Contributor Author

@yao-matrix

@danieldk
Copy link
Member

danieldk commented Jul 30, 2025

  1. how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
    Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.

Yeah, everything needs to be hashed and accessible in the build sandbox, otherwise the builds are not reproducible. We also need to be able to cache build artifacts to avoid that downstream users have to rebuild everything.

I am not very familiar with oneAPI, but aren't Ubuntu/RHEL packages available without accepting an EULA?

https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/install-using-package-managers.html

Not sure about the licensing terms though.

  1. some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.
  • torch-xpu-ops: you can use fetchFromGitHub and then move it to ${TORCH_ROOT}/third_party/torch-xpu-ops in the post-unpack phase.
  • oneDNN: use fetchFromGitHub and point XPU_MKLDNN_DIR_PREFIX to it.

@sywangyi
Copy link
Contributor Author

sywangyi commented Jul 30, 2025

  1. how to handle Intel oneapi in Nix. pytorch needs some components from Intel oneAPI to enable XPU, like SYCL, opencl, see https://github.com/pytorch/pytorch/blob/main/cmake/Modules/FindSYCLToolkit.cmake and tpi. see https://github.com/pytorch/kineto/blob/5e7501833f1021ce6f618572d3baf657b6319658/libkineto/src/plugin/xpupti/CMakeLists.txt#L23,. Nix builds are reproducible: They can't install proprietary software that requires registration
    Intel oneAPI is unfree: It requires accepting EULAs and has licensing restrictions. I try to use the the hybrid approach. use oneapi installed in host. However. There is a fundamental limitation: Nix builds run in isolated sandboxes that don't have access to host system directories like opt.

Yeah, everything needs to be hashed and accessible in the build sandbox, otherwise the builds are not reproducible.

I am not very familiar with oneAPI, but aren't Ubuntu/RHEL packages available without accepting an EULA?

https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/install-using-package-managers.html

Not sure about the licensing terms though.

  1. some xpu git needed to be downloaded during torch build. like https://github.com/intel/torch-xpu-ops.git in https://github.com/pytorch/pytorch/blob/main/caffe2/CMakeLists.txt#L1138 and https://github.com/oneapi-src/oneDNN in https://github.com/pytorch/pytorch/blob/7eb5fdb3580e57a6813dce7fbb23f008c3f4c270/cmake/Modules/FindMKLDNN.cmake#L48, which will be blocked by Nix build sandbox.
  • torch-xpu-ops: you can use fetchFromGitHub and then move it to ${TORCH_ROOT}/third_party/torch-xpu-ops in the post-unpack phase.
  • oneDNN: use fetchFromGitHub and point XPU_MKLDNN_DIR_PREFIX to it.

download key and add signed entry is needed

download the key to system keyring

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

add signed entry to apt sources and configure the APT client to use Intel repository:

echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
see https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/apt.html#GUID-2CAA766C-0CBA-4056-BD4B-264B6A875854

@danieldk
Copy link
Member

danieldk commented Jul 30, 2025

But that's only required because APT will not install anything from a repository if it cannot verify the metadata GPG signatures. Importing the GPG keys does not require accepting the EULA.

We should still check if the oneAPI EULA permits our use though -- Nix has to patch binaries (for fixing library paths, etc.), and we have to be able to provide outputs through our binary cache to avoid that users have to rebuild everything.

@sywangyi
Copy link
Contributor Author

sywangyi commented Jul 30, 2025

But that's only required because APT will not install anything from a repository if it cannot verify the metadata GPG signatures. Importing the GPG keys does not require accepting the EULA.

We should still check if the oneAPI EULA permits our use though -- Nix has to patch binaries (for fixing library paths, etc.), and we have to be able to provide outputs through our binary cache to avoid that users have to rebuild everything.

maybe could I follow the guide https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html to installed it in nix? see section of "Install through a Command Line" must indicate "--eula accept"

@sywangyi
Copy link
Contributor Author

sywangyi commented Jul 31, 2025

This is the benefit and constrains of the offline installation
Key Benefits
Offline installation: No network access needed during build
Reproducible: Same hash = same result every time
Complete oneAPI: You get the full toolkit, not just vendored files
Self-contained: Everything is in the Nix store

Considerations
Large download: The oneAPI toolkit is several GB
Build time: Installation takes significant time
License: Must accept Intel's EULA (unfree package)
System resources: Requires substantial disk space

I see there's "allowUnfree = true" for cuda and rocm, so shall I implement it in the same way?
@danieldk

@danieldk
Copy link
Member

danieldk commented Jul 31, 2025

I see there's "allowUnfree = true" for cuda and rocm, so shall I implement it in the same way? @danieldk

Note though that allowUnfree = true is just a signal to nixpkgs that unfree packages can be built because they do not want proprietary packages in their cache. There is no explicit confirmation of the EULA by the user. The user can just drop a flake.nix into their repo and at no point they have to agree to an EULA to build CUDA or ROCm kernels.

If it is necessary to get every user who builds a kernel to accept the oneAPI EULA (since they are by extension downloading oneAPI), than marking oneAPI packages as unfree does not add anything, it's just metadata, but not metadata the user will ever see.

This is the benefit and constrains of the offline installation

I think the best approach would be to automatically generate Nix derivations from RHEL/RPM packages and then overlay package-specific changes like we do for ROCm (and nixpkgs does for CUDA):

https://github.com/huggingface/hf-nix/blob/main/pkgs/rocm-packages/default.nix

Using a single installer/tarball makes build times really large and does not compose well. It makes it hard to do fixes on top, whereas with the package-based approach, we can easily layer package-specific extensions:

https://github.com/huggingface/hf-nix/blob/main/pkgs/rocm-packages/overrides.nix

Most likely there needs to be some compiler/stdenv wrapping, etc., which is much easier with separate packages/derivations.

It also avoids that the dependency closures become dependent on all of oneAPI, which will probably be unnecessarily large (for ROCm/CUDA we only end up with a small subset of the toolkits).

@danieldk
Copy link
Member

danieldk commented Jul 31, 2025

Thinking more about it, I am not sure who should accept the EULA. Us as a party that would redistribute oneAPI through our binary cache? Or kernel developers that use kernel-builder? Or both? I think the implementation of EULA acceptance depends on who we need to accept the EULA.

If it is the kernel developer who should accept it, we could consider adding an attribute to genFlakeOutputs to explicitly accept the EULA. Though it potentially raises new questions: what if we want to build kernels automatically in CI (which we might want to do in the future)? What if other kernel developers contribute to the kernel, since the genFlakeOutputs attribute is already set at that point, they will never accept the EULA, etc.

@danieldk danieldk merged commit 1f403d7 into huggingface:main Jul 31, 2025
9 checks passed
@danieldk
Copy link
Member

Merged this PR, many thanks for adding XPU support to build2cmake. 🎉

@sywangyi
Copy link
Contributor Author

sywangyi commented Aug 1, 2025

Thanks for all the suggestion, I will have a try if I could automatically generate Nix derivations from RHEL/RPM packages and then overlay package-specific changes like what you have done for ROCm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants