-
Notifications
You must be signed in to change notification settings - Fork 99
Support building container image on aarch64 (arm64) #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
deployments/container/Dockerfile
Outdated
| RUN wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-amd64.tar.gz \ | ||
| | tar -C /usr/local -xz | ||
|
|
||
| # Support building on Linux on aarch64 (arm64) and on x86_64 (amd64). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the logic here https://github.com/NVIDIA/k8s-device-plugin/blob/f666bc3f836a09ae2fda439f3d7a8d8b06b48ac4/deployments/container/Dockerfile#L23-L34
It may be good to be consistent across projects in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that we were building a single architecture here is that we explicitly cross-comile the binaries where we build them later. Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case? Is gcc just an alias to this compiler when an aarch64 image is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor point (that is also applicable to the device plugin) is that we could use TARGETARCH here directly instead of running uname -m.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that we were building a single architecture here is that we explicitly cross-comile the binaries
I understand. We decoupled build architecture from target architecture, which is good (allowing for cross-compilation). That is, we allowed for the target architecture to be flexible. But we did not allow for the build architecture to be flexible: we were explicitly requesting an AMD64 image via --platform=${BUILDOS}/amd64 regardless of the current (build) architecture.
That is, we could do AMD64 -> AMD64 and AMD64 -> ARM64. But e.g. ARM64 -> ARM64 does not work.
Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case?
Seen that, wondered the same -- I tested this patch with AMD64 -> AMD64 and ARM64 -> ARM64.
we could use TARGETARCH here directly instead of running uname -m.
that would again couple build arch to target arch.
Does the logic to set cc=aarch64-linux-gnu-gcc as a cross compiler there need to be changed in this case? Is gcc just an alias to this compiler when an aarch64 image is used.
I thought about it a bit. I don't think this needs changing. Running gcc on aarch64 should by default build for aarch64. In any case, when target arch is arm64 we always run aarch64-linux-gnu-gcc and I think on aarch64 this is basically "normal gcc" :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to be consistent across projects in this case.
certainly would have been good, because then I would not have run into a problem :). I will look at this, and maybe copy/paste that code instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying things once more. I think I missed some of the subtleties in my first pass.
That said, does the use of TARGETARCH in the make target below need to be reassessed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the use of TARGETARCH in the make target below need to be reassessed?
You mean in
make CC=${cc} GOARCH=${TARGETARCH} PREFIX=/artifacts cmds
?
Hm. It still looks good to me. GOARCH signifies the target architecture:
GOARCH is the running program's architecture target: one of 386, amd64, arm, s390x, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the logic here
pushed a commit, used that instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, we could do AMD64 -> AMD64 and AMD64 -> ARM64
this patch probably broke AMD64 -> ARM64 (as observed by @klueska)
2755d85 to
9760ffb
Compare
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
fc1a563 to
b38146c
Compare
| LABEL org.opencontainers.image.description "NVIDIA DRA Driver for GPUs" | ||
| LABEL org.opencontainers.image.source "https://github.com/NVIDIA/k8s-dra-driver-gpu" | ||
| LABEL org.opencontainers.image.description="NVIDIA DRA Driver for GPUs" | ||
| LABEL org.opencontainers.image.source="https://github.com/NVIDIA/k8s-dra-driver-gpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I needed to update this branch I took in this feedback from a different PR (not a fix, merely a style consistency).
| ARG GOLANG_VERSION=1.23.8 | ||
| # We use an ubuntu20.04 base image to allow for a more efficient multi-arch builds. | ||
| FROM --platform=${BUILDOS}/amd64 nvcr.io/nvidia/cuda:12.8.1-base-ubuntu20.04 AS build | ||
| FROM nvcr.io/nvidia/cuda:12.8.1-base-ubuntu20.04 AS build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption when removing --platform here was that the default is the current (build) platform.
That assumption was wrong. Ref docs are: https://docs.docker.com/reference/dockerfile/#from
Quote:
The optional
--platformflag can be used to specify the platform of the image in case FROM references a multi-platform image. For example, linux/amd64, linux/arm64, or windows/amd64. By default, the target platform of the build request is used.
deployments/container/Dockerfilecurrently assumes building on AMD64.For example, on a system where
uname -mreturnsaarch64we see:This patch allows for building on aarch64.
I manually tested this patch by running
make -f deployments/container/Makefile buildon the two different platforms.