Skip to content

Troubleshooting

Dr. Jan-Philip Gehrcke edited this page Oct 11, 2025 · 6 revisions

Collecting data

Kubelet plugin logs

Collect all kubelet plugin logs into a single file:

kubectl logs \
    -n nvidia-dra-driver-gpu \
    -l nvidia-dra-driver-gpu-component=kubelet-plugin \
    --prefix \
    --all-containers \
    --timestamps \
    --tail=-1 \
    > dra-driver-dbg_plugins_$(date -u +"%Y-%m-%dT%H:%M:%SZ").log

Notes:

  • In a larger-scale environment, this may fetch a lot of data.
  • Adding --prefix and --timestamps is critical for debuggability.

Controlling log verbosity

During helm install et al.

Log verbosity can be set for all components using the --set logVerbosity=<V> parameter during helm install ... or helm upgrade -i ....

Post-install

The verbosity can be changed after deployment and per-component, using various finer-grained mechanisms. Some examples are shown below.

Note that for now none of the components can update their log verbosity truly at runtime -- a pod restart is always required (to pick up mutated configuration).

Controller

Set log verbosity of just the controller pod:

kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY=6

This command restarts the controller pod.

Kubelet plugins

Set log verbosity across kubelet plugin instances:

kubectl set env ds nvidia-dra-driver-gpu-kubelet-plugin -n nvidia-dra-driver-gpu LOG_VERBOSITY=6

This command triggers a restart for all plugin pods.

ComputeDomain daemons

Set log verbosity of CD daemons started in the future (this restarts the controller pod):

kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY_CD_DAEMON=6

Clone this wiki locally