Support building container image on aarch64 (arm64) #323

jgehrcke · 2025-04-23T11:14:46Z

deployments/container/Dockerfile currently assumes building on AMD64.

For example, on a system where uname -m returns aarch64 we see:

$ make -f deployments/container/Makefile build
...
 => ERROR [build 2/7] RUN apt-get update &&     apt-get install -y wget make git gcc-aarch64-linux-gnu gcc     &&     rm -rf /var/lib/apt/lists/*  0.6s
------
 > [build 2/7] RUN apt-get update &&     apt-get install -y wget make git gcc-aarch64-linux-gnu gcc     &&     rm -rf /var/lib/apt/lists/*:
0.273 exec /bin/sh: exec format error

This patch allows for building on aarch64.

I manually tested this patch by running make -f deployments/container/Makefile build on the two different platforms.

copy-pr-bot · 2025-04-23T11:14:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

elezar · 2025-04-23T13:04:46Z

deployments/container/Dockerfile

-RUN wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-amd64.tar.gz \
-    | tar -C /usr/local -xz
+
+# Support building on Linux on aarch64 (arm64) and on x86_64 (amd64).


See the logic here https://github.com/NVIDIA/k8s-device-plugin/blob/f666bc3f836a09ae2fda439f3d7a8d8b06b48ac4/deployments/container/Dockerfile#L23-L34

It may be good to be consistent across projects in this case.

The reason that we were building a single architecture here is that we explicitly cross-comile the binaries where we build them later. Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case? Is gcc just an alias to this compiler when an aarch64 image is used.

A minor point (that is also applicable to the device plugin) is that we could use TARGETARCH here directly instead of running uname -m.

The reason that we were building a single architecture here is that we explicitly cross-comile the binaries

I understand. We decoupled build architecture from target architecture, which is good (allowing for cross-compilation). That is, we allowed for the target architecture to be flexible. But we did not allow for the build architecture to be flexible: we were explicitly requesting an AMD64 image via --platform=${BUILDOS}/amd64 regardless of the current (build) architecture.

That is, we could do AMD64 -> AMD64 and AMD64 -> ARM64. But e.g. ARM64 -> ARM64 does not work.

Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case?

Seen that, wondered the same -- I tested this patch with AMD64 -> AMD64 and ARM64 -> ARM64.

we could use TARGETARCH here directly instead of running uname -m.

that would again couple build arch to target arch.

Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case? Is gcc just an alias to this compiler when an aarch64 image is used.

I thought about it a bit. I don't think this needs changing. Running gcc on aarch64 should by default build for aarch64. In any case, when target arch is arm64 we always run aarch64-linux-gnu-gcc and I think on aarch64 this is basically "normal gcc" :).

It may be good to be consistent across projects in this case.

certainly would have been good, because then I would not have run into a problem :). I will look at this, and maybe copy/paste that code instead.

Thanks for clarifying things once more. I think I missed some of the subtleties in my first pass.

That said, does the use of TARGETARCH in the make target below need to be reassessed?

does the use of TARGETARCH in the make target below need to be reassessed?

You mean in

make CC=${cc} GOARCH=${TARGETARCH} PREFIX=/artifacts cmds

?

Hm. It still looks good to me. GOARCH signifies the target architecture:

GOARCH is the running program's architecture target: one of 386, amd64, arm, s390x, and so on.

See the logic here

pushed a commit, used that instead

That is, we could do AMD64 -> AMD64 and AMD64 -> ARM64

this patch probably broke AMD64 -> ARM64 (as observed by @klueska)

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

jgehrcke · 2025-04-25T12:37:49Z

deployments/container/Dockerfile

-LABEL org.opencontainers.image.description "NVIDIA DRA Driver for GPUs"
-LABEL org.opencontainers.image.source "https://github.com/NVIDIA/k8s-dra-driver-gpu"
+LABEL org.opencontainers.image.description="NVIDIA DRA Driver for GPUs"
+LABEL org.opencontainers.image.source="https://github.com/NVIDIA/k8s-dra-driver-gpu"


As I needed to update this branch I took in this feedback from a different PR (not a fix, merely a style consistency).

jgehrcke · 2025-05-20T10:04:54Z

deployments/container/Dockerfile

 ARG GOLANG_VERSION=1.23.8
 # We use an ubuntu20.04 base image to allow for a more efficient multi-arch builds.
-FROM --platform=${BUILDOS}/amd64 nvcr.io/nvidia/cuda:12.8.1-base-ubuntu20.04 AS build
+FROM  nvcr.io/nvidia/cuda:12.8.1-base-ubuntu20.04 AS build


My assumption when removing --platform here was that the default is the current (build) platform.

That assumption was wrong. Ref docs are: https://docs.docker.com/reference/dockerfile/#from

Quote:

The optional --platform flag can be used to specify the platform of the image in case FROM references a multi-platform image. For example, linux/amd64, linux/arm64, or windows/amd64. By default, the target platform of the build request is used.

elezar reviewed Apr 23, 2025

View reviewed changes

jgehrcke force-pushed the jp/allow-building-on-aarch64 branch from 2755d85 to 9760ffb Compare April 25, 2025 12:32

Support building container image on aarch64 (arm64)

b38146c

Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>

jgehrcke force-pushed the jp/allow-building-on-aarch64 branch from fc1a563 to b38146c Compare April 25, 2025 12:34

jgehrcke commented Apr 25, 2025

View reviewed changes

elezar approved these changes Apr 25, 2025

View reviewed changes

jgehrcke merged commit ea3e26d into NVIDIA:main Apr 25, 2025
7 checks passed

jgehrcke commented May 20, 2025

View reviewed changes

jgehrcke mentioned this pull request May 20, 2025

Fix slow amd64->arm64 build #373

Merged

klueska added this to Planning Board: k8s-dra-driver-gpu Jun 16, 2025

github-project-automation bot moved this to Backlog in Planning Board: k8s-dra-driver-gpu Jun 16, 2025

klueska added this to the v25.3.0 milestone Aug 13, 2025

Support building container image on aarch64 (arm64) #323

Support building container image on aarch64 (arm64) #323

Uh oh!

Conversation

jgehrcke commented Apr 23, 2025

Uh oh!

copy-pr-bot bot commented Apr 23, 2025

Uh oh!

elezar Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgehrcke Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elezar Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elezar Apr 23, 2025 •

edited

Loading

jgehrcke Apr 24, 2025 •

edited

Loading

elezar Apr 24, 2025 •

edited

Loading