-
Notifications
You must be signed in to change notification settings - Fork 101
Description
In #307 we introduced a change that mounts the hosts /dev over the DRA driver's /dev regardless of whether we are using a host-managed GPU driver or an operator-managed GPU driver.
In doing so, we inadvertently create device nodes for all NVIDIA devices at the host's /dev whenever we run any NVML related code.
Unfortunately, this causes problems with cri-o when trying to run privileged containers with CDI injected GPU devices.
The root of the problem stems from the following check that cri-o makes:
for _, device := range c.Config().GetDevices() {
// If we are privileged, we have access to devices on the host.
// If the requested container path already exists on the host, the container won't see the expected host path.
// Therefore, we must error out if the container path already exists
if c.Privileged() && device.GetContainerPath() != device.GetHostPath() {
// we expect this to not exist
_, err := os.Stat(device.GetContainerPath())
if err == nil {
return errors.New("privileged container was configured with a device container path that already exists on the host")
}
if !os.IsNotExist(err) {
return fmt.Errorf("error checking if container path exists on host: %w", err)
}
}
When a container is started as privileged it sees all of the NVIDIA device nodes at the host's /dev and has them injected due to the privileged setting. However when CDI then tries to inject the same device nodes from /run/nvidia/driver/dev/, it triggers the block of code above to error out the starting of the container with the following:
privileged container was configured with a device container path that already exists on the host
This happens because the container already has a path mounted for /dev/nvidia* due to our inadvertent creation of these device nodes and the fact that the container is running privileged.
We need to either:
- Stop mounting the host's /dev over our own /dev in all cases; OR
- Prevent our calls to NVML to create device nodes inadvertently in our local
/devdirectory (which is still mounted from the host).
Metadata
Metadata
Labels
Type
Projects
Status