-
Notifications
You must be signed in to change notification settings - Fork 719
Handle --gpus flag using CDI #4617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, does this need any change on https://github.com/containerd/nerdctl/blob/main/docs/gpu.md ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this as well in the latest revision.
|
Yes, most likely. I will look at updating the docs too.
**Update**: Updated.
|
This change switches to using CDI to handle the --gpus flag. This removes the custom implementation that invoked the nvidia-container-cli directly. This mechanism does not align with existing implementations. Signed-off-by: Evan Lezar <[email protected]>
88c37fa to
ccdb3e6
Compare
ChengyuZhu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have tested on my machine, and it works.
| nerdctl run -it --rm --gpus 'device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi | ||
| ``` | ||
|
|
||
| Note that although `capabilities` options may be provided, these are ignored when processing the GPU request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note that although `capabilities` options may be provided, these are ignored when processing the GPU request. | |
| Note that although `capabilities` options may be provided, these are ignored when processing the GPU request since nerdctl v2.3. |
| - `nvidia-container-cli` | ||
| - containerd relies on this CLI for setting up GPUs inside container. You can install this via [`libnvidia-container` package](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#libnvidia-container). | ||
| - The NVIDIA Container Toolkit | ||
| - containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add something like
| - containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). | |
| - containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). | |
| > [!NOTE] | |
| > The description in this section applies to nerdctl v2.3 or later. | |
| > Users of prior releases of nerdctl should refer to <https://github.com/containerd/nerdctl/blob/v2.2.0/docs/gpu.md> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the note should be rather moved to the top of the documentation
|
This PR can fix the issue #4621 |
This change switches to using CDI to handle the --gpus flag. This removes the custom implementation that invoked the nvidia-container-cli directly. This mechanism does not align with existing implementations.
See also:
ctr: Map ctr --gpus requests to NVIDIA CDI device requests containerd#12537docker: Use cdi device driver to handle nvidia --gpus requests moby/moby#50228