-
Notifications
You must be signed in to change notification settings - Fork 99
Troubleshooting
Collect all kubelet plugin logs into a single file:
kubectl logs \
-n nvidia-dra-driver-gpu \
-l nvidia-dra-driver-gpu-component=kubelet-plugin \
--prefix \
--all-containers \
--timestamps \
--tail=-1 \
> dra-driver-dbg_plugins_$(date -u +"%Y-%m-%dT%H:%M:%SZ").log
Notes:
- In a larger-scale environment, this may fetch a lot of data.
- Adding
--prefixand--timestampsis critical for debuggability.
Log verbosity can be set for all components using the --set logVerbosity=<V> parameter during helm install ... or helm upgrade -i ....
The verbosity can be changed after deployment and per-component, using various finer-grained mechanisms. Some examples are shown below.
Note that for now none of the components can update their log verbosity truly at runtime -- a pod restart is always required (to pick up mutated configuration).
Set log verbosity of just the controller pod:
kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY=6
This command restarts the controller pod.
Set log verbosity across kubelet plugin instances:
kubectl set env ds nvidia-dra-driver-gpu-kubelet-plugin -n nvidia-dra-driver-gpu LOG_VERBOSITY=6
This command triggers a restart for all plugin pods.
Set log verbosity of CD daemons started in the future (this restarts the controller pod):
kubectl set env deployment nvidia-dra-driver-gpu-controller -n nvidia-dra-driver-gpu LOG_VERBOSITY_CD_DAEMON=6