Skip to content

[BUG] GPU-PV broken for Windows 11 24H2 #247

@dj-vandijk

Description

@dj-vandijk

Describe the bug
We're using the GPU-PV functionality of AKS-Edge and this has been working great for us so far. However after updating one of our machines to Windows 11 24H2 (coming from 23H2) the nvidia-device-plugin no longer seems to be working. It fails to start with the error:

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: unable to load the nvml library: unknown

To Reproduce
Steps to reproduce the behavior:

  1. Install Windows 11 24H2
  2. Follow the GPU acceleration guide from: https://learn.microsoft.com/en-us/azure/aks/aksarc/aks-edge-gpu
  3. observe the error "unable to load the nvml library"

Environment (please complete the following information):

  • AKS Edge Essentials Version: 1.9.262.0
  • Kubernetes version 1.29.6
  • Windows Host OS
    • Edition: Professional
    • Version: 24H2 build 26100.3476
  • NVIDIA RTX A5000
  • NVIDIA driver 572.83

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions