Skip to content

Conversation

@elezar
Copy link
Contributor

@elezar elezar commented Nov 25, 2025

This change switches to using CDI to handle the --gpus flag. This removes the custom implementation that invoked the nvidia-container-cli directly. This mechanism does not align with existing implementations.

See also:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, does this need any change on https://github.com/containerd/nerdctl/blob/main/docs/gpu.md ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this as well in the latest revision.

@AkihiroSuda AkihiroSuda added this to the v2.3.0 milestone Nov 26, 2025
@elezar
Copy link
Contributor Author

elezar commented Nov 26, 2025 via email

This change switches to using CDI to handle the --gpus flag.
This removes the custom implementation that invoked the nvidia-container-cli
directly. This mechanism does not align with existing implementations.

Signed-off-by: Evan Lezar <[email protected]>
Copy link
Member

@ChengyuZhu6 ChengyuZhu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have tested on my machine, and it works.

nerdctl run -it --rm --gpus 'device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
```

Note that although `capabilities` options may be provided, these are ignored when processing the GPU request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that although `capabilities` options may be provided, these are ignored when processing the GPU request.
Note that although `capabilities` options may be provided, these are ignored when processing the GPU request since nerdctl v2.3.

- `nvidia-container-cli`
- containerd relies on this CLI for setting up GPUs inside container. You can install this via [`libnvidia-container` package](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#libnvidia-container).
- The NVIDIA Container Toolkit
- containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
Copy link
Member

@AkihiroSuda AkihiroSuda Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something like

Suggested change
- containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
- containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
> [!NOTE]
> The description in this section applies to nerdctl v2.3 or later.
> Users of prior releases of nerdctl should refer to <https://github.com/containerd/nerdctl/blob/v2.2.0/docs/gpu.md>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the note should be rather moved to the top of the documentation

@ChengyuZhu6
Copy link
Member

This PR can fix the issue #4621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants