Skip to content

How to configure nvidia-container-runtime to expose only certain GPUs from the host to docker #836

@lordofire

Description

@lordofire

Hi there,

In our use case, we have 1 k8s node (special-designed hardware) with 3 GPUs, 2 GPUs used for container workloads and 1 GPU used for display purposes. In the current setup, all three GPUs are exposed to containers by default. I would like to know how to make nvidia-container-runtime and docker to only expose 2 GPUs by default for any pods scheduled on this node. Specifically:

  1. When the nvidia-device-plugin expose the nvidia.com/gpu to the k8s node capacity, it should be 2 instead of 3.
  2. When the pod is using NVIDIA_VISIBLE_DEVICES=all on the node, it should only see 2 instead of 3.

Note that we could not drain that 1 GPU, since it will still be needed for non-container GPU workloads.

Searched a bit on both this repo as well as other online, but did not find a good solution so far. Thanks in advance for the help.
Jianan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions