-
Notifications
You must be signed in to change notification settings - Fork 99
Troubleshooting
Collect all kubelet plugin logs into a single file:
kubectl logs \
-n nvidia-dra-driver-gpu \
-l nvidia-dra-driver-gpu-component=kubelet-plugin \
--prefix \
--all-containers \
--timestamps \
--tail=-1 \
> dra-driver-dbg_plugins_$(date -u +"%Y-%m-%dT%H%M%SZ").log
Notes:
- In a larger-scale environment, this may fetch a lot of data.
- Adding
--prefixand--timestampsis critical for debuggability.
Use this shell function (paste it into your terminal):
get_all_cd_daemon_logs_for_cd_name() {
if [ -z "$*" ]; then echo "missing arg: CD name"; return 1; fi
CD_NAME="$1"
CD_UID=$(kubectl describe computedomains.resource.nvidia.com "${CD_NAME}" | grep UID | awk '{print $2}')
CD_LABEL_KV="resource.nvidia.com/computeDomain=${CD_UID}"
_filename="dra-driver-dbg_cd-daemons_$(date -u +"%Y-%m-%dT%H%M%SZ").log.gz"
echo "fetching CD daemon logs for CD: $CD_LABEL_KV ($CD_NAME), creating $_filename"
kubectl logs \
-n nvidia-dra-driver-gpu \
-l "${CD_LABEL_KV}" \
--all-containers \
--timestamps \
--tail=-1 \
--prefix \
--all-containers | gzip > "${_filename}"
}Run it for a specific CD. Example:
$ get_all_cd_daemon_logs_for_cd_name imex-channel-injection
fetching CD daemon logs for CD: resource.nvidia.com/computeDomain=a97f19b1-b41e-4266-8ecd-d2730f96dbb2 (imex-channel-injection), creating dra-driver-dbg_cd-daemons_2025-11-18T144249Z.log.gz
Log verbosity can be set for all components using the --set logVerbosity=<V> parameter during helm install ... or helm upgrade -i ....
The verbosity can be changed after deployment and per-component, using various finer-grained mechanisms. Some examples are shown below.
Note that for now none of the components can update their log verbosity truly at runtime -- a pod restart is always required (to pick up mutated configuration).
Set log verbosity of just the controller pod:
kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY=6
This command restarts the controller pod.
Set log verbosity across kubelet plugin instances:
kubectl set env ds nvidia-dra-driver-gpu-kubelet-plugin -n nvidia-dra-driver-gpu LOG_VERBOSITY=6
This command triggers a restart for all plugin pods.
Set log verbosity of CD daemons started in the future (this restarts the controller pod):
kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY_CD_DAEMON=6