-
Notifications
You must be signed in to change notification settings - Fork 412
Closed
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
1. Quick Debug Information
- Kubernetes Version: v1.28
- GPU Operator Version: v24.6.1
2. Issue description
The Kubernetes cluster has two worker nodes and each contains four A100 GPUs. During pod deployment, I use the NVIDIA_VISIBLE_DEVICES environment to specify which GPU to use (e.g., "3") (following the instructions in the link). However, when I run the kubectl exec -it [pod_name] -- nvidia-smi command, it sometimes shows only the specified GPU, but at other times, it displays an additional GPU alongside the specified one. The following picture illustrates the result. This causes some trouble for me. I'm wondering if there might be an issue.
I deploy GPU Operator with the following command:
helm install gpu-operator \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false \
--set mig.strategy=mixed \
-f gpu-operator-values.yaml \
--set dcgmExporter.config.name=custom-dcgm-metrics
All the GPU-operator pods are running well:
Metadata
Metadata
Assignees
Labels
lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.

