-
Notifications
You must be signed in to change notification settings - Fork 412
Open
Labels
bugIssue/PR to expose/discuss/fix a bugIssue/PR to expose/discuss/fix a bug
Description
1. Quick Debug Information
- OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04
- Kernel Version: 6.2.0-37-generic
- Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): Containerd, 1.7.7
- K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): Rancher/RKE2, 1.27.8
- GPU Operator Version: 23.9.1.
2. Issue or feature description
When MIG is enabled, both MIG resource and nvidia.com/gpu resource are reported as allocatable:
Allocatable:
cerit.io/gpu-count: 2
cerit.io/gpu-mem: 0
cpu: 64
ephemeral-storage: 7104643354787
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 519659388Ki
nvidia.com/gpu: 2
nvidia.com/mig-1g.10gb: 6
nvidia.com/mig-2g.20gb: 4
nvidia.com/mig-3g.40gb: 0
pods: 160
which means that both requests nvidia.com/gpu and nvidia.com/mig-1g.10gb can land on the node, however, the nvidia.com/gpu request fails to inject GPU.
3. Steps to reproduce the issue
Enable MIG on A100 GPU.
This may be just a bug in Kubernetes, not the gpu operator itself.
Metadata
Metadata
Labels
bugIssue/PR to expose/discuss/fix a bugIssue/PR to expose/discuss/fix a bug