Unexpected GPU Allocation with NVIDIA_VISIBLE_DEVICES in Kubernetes

### 1. Quick Debug Information
* Kubernetes Version: v1.28
* GPU Operator Version: v24.6.1


### 2. Issue description
The Kubernetes cluster has two worker nodes and each contains four A100 GPUs. During pod deployment, I use the NVIDIA_VISIBLE_DEVICES environment to specify which GPU to use (e.g., "3") (following the instructions in the [link](https://github.com/NVIDIA/gpu-operator/issues/656)). However, when I run the `kubectl exec -it [pod_name] -- nvidia-smi` command, it sometimes shows only the specified GPU, but at other times, it displays an additional GPU alongside the specified one. The following picture illustrates the result. This causes some trouble for me. I'm wondering if there might be an issue.

![image](https://github.com/user-attachments/assets/119e32f7-e9a4-4218-8c04-afbbd20c50ab)

I deploy GPU Operator with the following command:
```
helm install gpu-operator \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --set driver.enabled=false \
    --set mig.strategy=mixed \
    -f gpu-operator-values.yaml \
    --set dcgmExporter.config.name=custom-dcgm-metrics
```

All the GPU-operator pods are running well:

![image](https://github.com/user-attachments/assets/8df341de-8151-4fd2-ace7-32ab3008dd5a)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected GPU Allocation with NVIDIA_VISIBLE_DEVICES in Kubernetes #951

1. Quick Debug Information

2. Issue description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected GPU Allocation with NVIDIA_VISIBLE_DEVICES in Kubernetes #951

Description

1. Quick Debug Information

2. Issue description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions