-
Notifications
You must be signed in to change notification settings - Fork 755
Open
Description
Hello,
I want to ask regarding label MIG on the node.
If we configure MIG on the node, then the k8s-device-plugin will add MIG labels to the node. But for nvidia.com/gpu.count label does not update with MIG configuration. Examples :
- Non MIG Configuration
...
nvidia.com/gpu.count=8
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=true
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.mig-manager=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=hopper
nvidia.com/gpu.machine=PowerEdge-XE9680
nvidia.com/gpu.memory=81559
nvidia.com/gpu.mode=compute
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
nvidia.com/gpu.replicas=1
nvidia.com/mig.capable=true
nvidia.com/mig.config=all-disabled
nvidia.com/mig.config.state=success
nvidia.com/mig.strategy=mixed
...
Capacity:
...
nvidia.com/gpu: 8
...
Allocatable:
...
nvidia.com/gpu: 8
...
...
- MIG Configuration
...
nvidia.com/gpu.count=8
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=true
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.mig-manager=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.nvsm=true
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=hopper
nvidia.com/gpu.machine=PowerEdge-XE9680
nvidia.com/gpu.memory=81559
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
nvidia.com/gpu.replicas=1
nvidia.com/mig-1g.10gb.count=14
nvidia.com/mig-1g.10gb.engines.copy=1
nvidia.com/mig-1g.10gb.engines.decoder=1
nvidia.com/mig-1g.10gb.engines.encoder=0
nvidia.com/mig-1g.10gb.engines.jpeg=1
nvidia.com/mig-1g.10gb.engines.ofa=0
nvidia.com/mig-1g.10gb.memory=9984
nvidia.com/mig-1g.10gb.multiprocessors=16
nvidia.com/mig-1g.10gb.product=NVIDIA-H100-80GB-HBM3-MIG-1g.10gb
nvidia.com/mig-1g.10gb.replicas=1
nvidia.com/mig-1g.10gb.slices.ci=1
nvidia.com/mig-1g.10gb.slices.gi=1
nvidia.com/mig-3g.40gb.count=5
nvidia.com/mig-3g.40gb.engines.copy=3
nvidia.com/mig-3g.40gb.engines.decoder=3
nvidia.com/mig-3g.40gb.engines.encoder=0
nvidia.com/mig-3g.40gb.engines.jpeg=3
nvidia.com/mig-3g.40gb.engines.ofa=0
nvidia.com/mig-3g.40gb.memory=40320
nvidia.com/mig-3g.40gb.multiprocessors=60
nvidia.com/mig-3g.40gb.product=NVIDIA-H100-80GB-HBM3-MIG-3g.40gb
nvidia.com/mig-3g.40gb.replicas=1
nvidia.com/mig-3g.40gb.slices.ci=3
nvidia.com/mig-3g.40gb.slices.gi=3
nvidia.com/mig-4g.40gb.count=5
nvidia.com/mig-4g.40gb.engines.copy=4
nvidia.com/mig-4g.40gb.engines.decoder=4
nvidia.com/mig-4g.40gb.engines.encoder=0
nvidia.com/mig-4g.40gb.engines.jpeg=4
nvidia.com/mig-4g.40gb.engines.ofa=0
nvidia.com/mig-4g.40gb.memory=40320
nvidia.com/mig-4g.40gb.multiprocessors=64
nvidia.com/mig-4g.40gb.product=NVIDIA-H100-80GB-HBM3-MIG-4g.40gb
nvidia.com/mig-4g.40gb.replicas=1
nvidia.com/mig-4g.40gb.slices.ci=4
nvidia.com/mig-4g.40gb.slices.gi=4
nvidia.com/mig.capable=true
nvidia.com/mig.config=mig-config-26
nvidia.com/mig.config.state=success
nvidia.com/mig.strategy=mixed
...
Capacity:
...
nvidia.com/gpu: 1
nvidia.com/mig-1g.10gb: 14
nvidia.com/mig-3g.40gb: 5
nvidia.com/mig-4g.40gb: 5
...
Allocatable:
...
nvidia.com/gpu: 1
nvidia.com/mig-1g.10gb: 14
nvidia.com/mig-3g.40gb: 5
nvidia.com/mig-4g.40gb: 5
...
...
Is there any solution to make the nominal count of the nvidia.com/gpu label appear in the node label same with count in capacity or match it with the GPU count configuration in the MIG config ?
(as in the examples above, it becomes nvidia.com/gpu: 1, but in the label node does not show the count 1 and still 8)
Metadata
Metadata
Assignees
Labels
No labels