Skip to content

Local Testing (with GTX1080) - error creating driver: failed to create device library: failed to locate driver libraries: error locating "libnvidia-ml.so.1" #204

@wenzel-felix

Description

@wenzel-felix

Hi,

I was following the tutorial to get DRA running and so initially everything was working as expected until the installation of the driver.

The kubelet plugin directly fails with:

Error: error creating driver: failed to create device library: failed to locate driver libraries: error locating "libnvidia-ml.so.1"

Some extra information:
In order for the cluster to be able to mount the cdi device, I needed to change the name/path from runtime.nvidia.com/gpu/all to nvidia.com/gpu/all as I only see this as CDI device.

➜  k8s-dra-driver git:(main) ✗ sudo nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all

Here some details about my setup:

GPU: GTX 1080
OS: Windows/WSL

nvidia-container-toolkit version:

NVIDIA Container Toolkit CLI version 1.17.0
commit: 5bc031544833253e3ab6a36daec376dc13a4f479

runtime config:

➜  k8s-dra-driver git:(main) ✗ nvidia-ctk runtime configure --dry-run
INFO[0000] Loading config from /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

nvidia-smi command on host:

➜  k8s-dra-driver git:(main) ✗ nvidia-smi
Tue Nov 12 20:54:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.02              Driver Version: 566.03         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080        On  |   00000000:01:00.0  On |                  N/A |
|  0%   58C    P0             45W /  210W |    1907MiB /   8192MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

nvidia-smi command on node:

root@k8s-dra-driver-cluster-worker:/# nvidia-smi
Tue Nov 12 19:55:46 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.02              Driver Version: 566.03         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080        On  |   00000000:01:00.0  On |                  N/A |
|  0%   61C    P0             46W /  210W |    1903MiB /   8192MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Any help would be highly appreciated!

I already saw a similar behavior in this issue, but it s not the same: #65

Metadata

Metadata

Assignees

Labels

bugIssue/PR to expose/discuss/fix a bug

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions