Skip to content

GPU recognized inside of podman container, but GPU calculations fail. #614

@jgforbes

Description

@jgforbes

I built the container-toolkit version 1.16 RPMs for centos8-x86-64 so that they could be installed on a FIPS enabled system.
We were running GPU driver 555.42.02

After installing on a server with m40 GPUs and running the command to get the CDI config file, running podman like so:
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:devicequery
retrurns the expected information. nvidia-smi in the container also worked.
However, running,
podman run - --device nvidia.com/gpu=0 --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubi8

returns
Failed to allocate device vector A (error code CUDA-capable device(s) is/are busy or unavailable)!
[Vector addition of 50000 elements]

Installing the driver version, 550.90.07, gave different results. The devicequery container work and nvidia-smi worked int he container. but the vectoradd container reported:
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 550.90.07 which has support for CUDA 12.4. This container
was built with CUDA 12.5 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Invalid argument (CUDA_ERROR_INVALID_VALUE) cuDevicePrimaryCtxRetain()=1]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Installing GPU driver 535 gave the same results as the 555 driver.

Installing GPU driver version 525,147.05 worked for all gpu calculations as expected.

Are these results due to the older M40 GPU even though it is supported in all of the drivers that we tried or something else?
Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions