Skip to content

Installation

Dr. Jan-Philip Gehrcke edited this page Oct 21, 2025 · 2 revisions

Prerequisites

  • Kubernetes v1.32 or newer.
  • DRA and corresponding API groups must be enabled (see Kubernetes docs).
  • CDI must be enabled in the underlying container runtime (such as containerd or CRI-O).
  • NVIDIA GPU Driver 565 or later.

We recommend installing GPU Operator v25.3.x or newer to cover the last two items on the list above (as well as for many other good reasons!). Soon, the preferred method to install this DRA driver will be through the GPU Operator anyway (this driver will then no longer require installation as a separate Helm chart). The state of this integration work can be followed here.

If you want to use ComputeDomains and a pre-installed ("host-provided") GPU Driver:

  • Make sure to have the corresponding nvidia-imex-* packages installed (via your Linux distribution's package manager).
  • Disable the IMEX systemd service before installing the GPU Operator (on all GPU nodes), with e.g. systemctl disable --now nvidia-imex.service && systemctl mask nvidia-imex.service.
  • Refer to the docs on installing the GPU Operator with a pre-installed GPU driver.

Configure and install the DRA driver with Helm

Release artifacts of this DRA driver are published to NVIDIA's container registry NGC (Helm chart, container images). The instructions below consume those artifacts. If you're interested in testing nightly development builds at your own risk, head over to this page.

  1. Add the NVIDIA NGC Catalog's Helm chart repository:

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
    
  2. Install the DRA driver, providing install-time configuration parameters.

    Example for Operator-provided GPU driver:

    helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
        --version="25.8.0" \
        --create-namespace \
        --namespace nvidia-dra-driver-gpu \
        --set resources.gpus.enabled=false \
        --set nvidiaDriverRoot=/run/nvidia/driver
    

    Example for host-provided GPU driver:

    helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
        --version="25.8.0" \
        --create-namespace \
        --namespace nvidia-dra-driver-gpu \
        --set resources.gpus.enabled=false
    

All install-time configuration parameters can be listed by running helm show values nvidia/nvidia-dra-driver-gpu.

Notes on selecting either of ComputeDomain and GPU support

  • By default, both resource types are enabled.
  • A common mode of operation for now is to enable only the ComputeDomain subsystem (to have GPUs allocated using the traditional device plugin). The examples above achieve that by using --set resources.gpus.enabled=false.
  • Likewise, if you only want to enable GPU support, you can also use --set resources.computeDomains.enabled=false.

Notes on nvidiaDriverRoot

  • Setting this parameter incorrectly is a common source of error.
  • The examples above use nvidiaDriverRoot=/run/nvidia/driver -- that location is where an GPU Operator-provided GPU driver is placed.
  • That configuration parameter must be changed in case the GPU driver is installed straight on the host (typically at /, which is the default value for nvidiaDriverRoot).

Validate installation

A lot can go wrong, depending on the exact nature of your Kubernetes environment and specific hardware and driver choices as well as configuration options chosen. That is why NVIDIA recommends performing a set of validation tests to confirm the basic functionality of your setup. To that end, NVIDIA has prepared separate documentation:

Special topics

Upgrading

TBD

Downgrading

TBD

Feature gates

TBD

Installing a Validating Admission Webhook

This component is disabled by default.

The webhook inspects user-given ResourceClaim and ResourceClaimTemplate specifications and validates their so-called opaque configuration parameters (user-given data otherwise not validated by the Kubernetes API server). This is helpful to enforce invariants and to provide actionable feedback to the user early in the information flow.

Also see Kubernetes' docs for admission control.

To enable the webhook, first install cert-manager and its CRDs (see below); then set webhook.enabled=true upon Helm chart installation.

helm install \
  --repo https://charts.jetstack.io \
  --version v1.16.3 \
  --create-namespace \
  --namespace cert-manager \
  --wait \
  --set crds.enabled=true \
  cert-manager \
  cert-manager