-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Describe the bug
We're using the GPU-PV functionality of AKS-Edge and this has been working great for us so far. However after updating one of our machines to Windows 11 24H2 (coming from 23H2) the nvidia-device-plugin no longer seems to be working. It fails to start with the error:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: unable to load the nvml library: unknown
To Reproduce
Steps to reproduce the behavior:
- Install Windows 11 24H2
- Follow the GPU acceleration guide from: https://learn.microsoft.com/en-us/azure/aks/aksarc/aks-edge-gpu
- observe the error "unable to load the nvml library"
Environment (please complete the following information):
- AKS Edge Essentials Version: 1.9.262.0
- Kubernetes version 1.29.6
- Windows Host OS
- Edition: Professional
- Version: 24H2 build 26100.3476
- NVIDIA RTX A5000
- NVIDIA driver 572.83