GPU recognized inside of podman container, but GPU calculations fail.


I built the container-toolkit version 1.16 RPMs for centos8-x86-64 so that they could be installed on a FIPS enabled system.
We were running GPU driver 555.42.02

After installing on a server with m40 GPUs and running the command to get the CDI config file, running podman like so:
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable  nvcr.io/nvidia/k8s/cuda-sample:devicequery
retrurns the expected information. nvidia-smi in the container also worked.
However, running,
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable  nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8 

returns
Failed to allocate device vector A (error code CUDA-capable device(s) is/are busy or unavailable)!
[Vector addition of 50000 elements]

Installing the driver version, 550.90.07, gave different results. The devicequery container work and nvidia-smi worked int he container. but the vectoradd container reported:
WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 550.90.07 which has support for CUDA 12.4.  This container
  was built with CUDA 12.5 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[Invalid argument (CUDA_ERROR_INVALID_VALUE) cuDevicePrimaryCtxRetain()=1]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Installing GPU driver 535 gave the same results as the 555 driver.

Installing GPU driver version 525,147.05 worked for all gpu calculations as expected.

Are these results due to the older M40 GPU even though it is supported in all of the drivers that we tried or something else?
Any ideas?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU recognized inside of podman container, but GPU calculations fail. #614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU recognized inside of podman container, but GPU calculations fail. #614

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions