-
Notifications
You must be signed in to change notification settings - Fork 435
Description
I built the container-toolkit version 1.16 RPMs for centos8-x86-64 so that they could be installed on a FIPS enabled system.
We were running GPU driver 555.42.02
After installing on a server with m40 GPUs and running the command to get the CDI config file, running podman like so:
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:devicequery
retrurns the expected information. nvidia-smi in the container also worked.
However, running,
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8
returns
Failed to allocate device vector A (error code CUDA-capable device(s) is/are busy or unavailable)!
[Vector addition of 50000 elements]
Installing the driver version, 550.90.07, gave different results. The devicequery container work and nvidia-smi worked int he container. but the vectoradd container reported:
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 550.90.07 which has support for CUDA 12.4. This container
was built with CUDA 12.5 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Invalid argument (CUDA_ERROR_INVALID_VALUE) cuDevicePrimaryCtxRetain()=1]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
Installing GPU driver 535 gave the same results as the 555 driver.
Installing GPU driver version 525,147.05 worked for all gpu calculations as expected.
Are these results due to the older M40 GPU even though it is supported in all of the drivers that we tried or something else?
Any ideas?