-
Notifications
You must be signed in to change notification settings - Fork 412
Open
Labels
needs-triageissue or PR has not been assigned a priority-px labelissue or PR has not been assigned a priority-px label
Description
Describe the bug
The following ccManager related section of clusterpolicy for NVIDIA GPU Operator 25.3.4 does not enable confidential computing for H200 GPUs (with VBIOS 96.00.d9.00.02, ID=0x2335) on OpenShift 4.19. It works fine with H100 GPUs in the similar environment on another node.
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: "gpu-cluster-policy"
spec:
ccManager:
defaultMode: "on"
enabled: true
env:
- name: CC_CAPABLE_DEVICE_IDS
value: 0x2335,0x2330,0x2331,0x2322
image: k8s-cc-manager
imagePullPolicy: IfNotPresent
imagePullSecrets: []
repository: nvcr.io/nvidia/cloud-native
resources: {}
version: v0.1.1
To Reproduce
After installing the GPU operator the cluster policy with the above ccManager section is applied. The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode says "CC mode is off" for all the GPUs.
Expected behavior
The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode should say "CC mode is on" for all the GPUs.
Metadata
Metadata
Assignees
Labels
needs-triageissue or PR has not been assigned a priority-px labelissue or PR has not been assigned a priority-px label