Skip to content

ccManager v0.1.1 does not enable confidential computing for H200 GPUs #1851

@niteeshkd

Description

@niteeshkd

Describe the bug
The following ccManager related section of clusterpolicy for NVIDIA GPU Operator 25.3.4 does not enable confidential computing for H200 GPUs (with VBIOS 96.00.d9.00.02, ID=0x2335) on OpenShift 4.19. It works fine with H100 GPUs in the similar environment on another node.

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: "gpu-cluster-policy"
spec:
  ccManager:
    defaultMode: "on"
    enabled: true
    env:
    - name: CC_CAPABLE_DEVICE_IDS
      value: 0x2335,0x2330,0x2331,0x2322
    image: k8s-cc-manager
    imagePullPolicy: IfNotPresent
    imagePullSecrets: []
    repository: nvcr.io/nvidia/cloud-native
    resources: {}
    version: v0.1.1

To Reproduce
After installing the GPU operator the cluster policy with the above ccManager section is applied. The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode says "CC mode is off" for all the GPUs.

Expected behavior
The output of nvidia_gpu_tools.py --devices gpus --query-cc-mode should say "CC mode is on" for all the GPUs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageissue or PR has not been assigned a priority-px label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions