Skip to content

nvidia-container-runtime unable to signal init: permission denied #796

@sense-amid-madness

Description

@sense-amid-madness

Hi, on one of my GPU servers, GPU containers using the nvidia container runtime fail to terminate due to permission issues, what could be the cause of this? They start up and run fine.

The error appears when trying to shutdown a container:

sudo ctr -n k8s.io task kill fddedcb271ff4df58b5e539fb246ca86700db730ecde0ae7c38be0d1c77d39e1
ctr: unknown error after kill: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 1: unable to signal init: permission denied
: unknown

Toolkit version is 1.17.1, containerd version 1.7.12.

Thanks much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions