After setup config to crio, crio can not stop container anymore.

Hello there, I found a Fatal Error in nvidia-ctk.
After run the config command as **```README```** says:
```
nvidia-ctk runtime configure --runtime=crio --set-as-default --config=/etc/crio/crio.conf.d/99-nvidia.conf
```
I get the config file as the example says.
Then I restart my crio service.
```
systemctl restart crio
```
Everything looks like peaceful.
But when I restart some deploys or delete some pods in k8s, the process will stuck in **```Terminating```**.
```
NAME                      READY  STATUS       RESTARTS  AGE
...
backend-6b7945bc64-jqwl7  1/1    Running      0         19m
backend-ccfff5ccc-ktngm   1/1    Terminating  0         22m             <---------- After 19 minutes still running
...
```
After a long time, I finally found this config will cause this problem:
```
...
  [crio.runtime]
    default_runtime = "nvidia"
...
```
This config changes the crio default user to "nvidia" not root, so the permission blocks all the action that crio wants to do.

After delete this config, crio returns to normal, however the new container can not use nvidia plugin anymore.

Therefore, I have these questions:
- Why nvidia user is necessary?
- Why root user can not use nvidia driver in container?
- Any other way to setup config for crio that make it work funcationally?

It will be really helpful for any suggestion you provide.

Thank you very much! <3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After setup config to crio, crio can not stop container anymore. #840

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

After setup config to crio, crio can not stop container anymore. #840

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions