-
Notifications
You must be signed in to change notification settings - Fork 435
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Hi, on one of my GPU servers, GPU containers using the nvidia container runtime fail to terminate due to permission issues, what could be the cause of this? They start up and run fine.
The error appears when trying to shutdown a container:
sudo ctr -n k8s.io task kill fddedcb271ff4df58b5e539fb246ca86700db730ecde0ae7c38be0d1c77d39e1
ctr: unknown error after kill: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 1: unable to signal init: permission denied
: unknown
Toolkit version is 1.17.1, containerd version 1.7.12.
Thanks much.
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.