-
Notifications
You must be signed in to change notification settings - Fork 435
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Hello there, I found a Fatal Error in nvidia-ctk.
After run the config command as README says:
nvidia-ctk runtime configure --runtime=crio --set-as-default --config=/etc/crio/crio.conf.d/99-nvidia.conf
I get the config file as the example says.
Then I restart my crio service.
systemctl restart crio
Everything looks like peaceful.
But when I restart some deploys or delete some pods in k8s, the process will stuck in Terminating.
NAME READY STATUS RESTARTS AGE
...
backend-6b7945bc64-jqwl7 1/1 Running 0 19m
backend-ccfff5ccc-ktngm 1/1 Terminating 0 22m <---------- After 19 minutes still running
...
After a long time, I finally found this config will cause this problem:
...
[crio.runtime]
default_runtime = "nvidia"
...
This config changes the crio default user to "nvidia" not root, so the permission blocks all the action that crio wants to do.
After delete this config, crio returns to normal, however the new container can not use nvidia plugin anymore.
Therefore, I have these questions:
- Why nvidia user is necessary?
- Why root user can not use nvidia driver in container?
- Any other way to setup config for crio that make it work funcationally?
It will be really helpful for any suggestion you provide.
Thank you very much! <3
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.