Skip to content

Commit 8db72b7

Browse files
deployments: Add Helm chart support for CDI mode
Enhance Helm chart to support both default and CDI operating modes. Changes: - values.yaml: Add CDI configuration section - cdi.enabled: Enable CDI spec mode (default: false) - cdi.configMapName: Reference external CDI spec ConfigMap - cdi.inlineSpec: Embed CDI spec in Helm values - cdi.architectureOverride: Manual architecture selection - mockDriver.architecture: Default mode architecture (dgxa100) - mock-driver-daemonset.yaml: Wire CDI configuration - Mount CDI spec ConfigMap when enabled - Set environment variables (MOCK_GPU_ARCH, MOCK_NVML_NUM_DEVICES) - Use entrypoint.sh instead of direct gpu-mockctl call - Pass CDI_SPEC_PATH when CDI mode enabled - cdi-configmap.yaml: New template for inline CDI specs - Creates ConfigMap from Helm values when cdi.inlineSpec set Deployment scenarios: 1. Default mode (zero-config): helm install gpu-mock ./helm/gpu-mock → 8 A100 GPUs, no configuration needed 2. CDI from ConfigMap: kubectl create configmap my-spec --from-file=spec.yaml=my-cdi.yaml helm install gpu-mock ./helm/gpu-mock --set cdi.enabled=true \ --set cdi.configMapName=my-spec → Custom GPU topology from CDI spec 3. CDI inline: helm install gpu-mock ./helm/gpu-mock --set cdi.enabled=true \ --set-file cdi.inlineSpec=my-cdi.yaml → CDI spec embedded in Helm release This provides flexibility while maintaining backward compatibility with existing zero-config deployments.
1 parent cfa5aef commit 8db72b7

File tree

3 files changed

+65
-3
lines changed

3 files changed

+65
-3
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{{- if and .Values.cdi.enabled .Values.cdi.inlineSpec -}}
2+
apiVersion: v1
3+
kind: ConfigMap
4+
metadata:
5+
name: {{ include "gpu-mock.fullname" . }}-cdi-spec
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "gpu-mock.labels" . | nindent 4 }}
9+
data:
10+
spec.yaml: |
11+
{{ .Values.cdi.inlineSpec | indent 4 }}
12+
{{- end }}
13+

deployments/devel/gpu-mock/helm/gpu-mock/templates/mock-driver-daemonset.yaml

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@ spec:
4141
- name: host-dev
4242
hostPath:
4343
path: /dev
44+
{{- if .Values.cdi.enabled }}
45+
- name: cdi-spec
46+
configMap:
47+
name: {{ .Values.cdi.configMapName | default (printf "%s-cdi-spec" (include "gpu-mock.fullname" .)) }}
48+
optional: true
49+
{{- end }}
4450

4551
initContainers:
4652
# Create mock driver filesystem and device nodes
@@ -55,15 +61,37 @@ spec:
5561
- name: DEBUG
5662
value: "true"
5763
{{- end }}
64+
# Mock GPU architecture configuration
65+
- name: MOCK_GPU_ARCH
66+
value: {{ .Values.cdi.architectureOverride | default .Values.mockDriver.architecture | quote }}
67+
# Number of GPUs (for default mode)
68+
- name: MOCK_NVML_NUM_DEVICES
69+
value: {{ .Values.mockDriver.gpuCount | quote }}
70+
# Driver root and device paths
71+
- name: DRIVER_ROOT
72+
value: "/host/var/lib/nvidia-mock/driver"
73+
- name: HOST_DEV
74+
value: "/host/dev"
75+
{{- if .Values.cdi.enabled }}
76+
# CDI spec path (will be present if ConfigMap is mounted)
77+
- name: CDI_SPEC_PATH
78+
value: "/config/cdi-spec.yaml"
79+
{{- end }}
5880
securityContext:
5981
{{- toYaml .Values.global.securityContext | nindent 12 }}
6082
volumeMounts:
6183
- name: host-driver
6284
mountPath: /host/var/lib/nvidia-mock/driver
6385
- name: host-dev
6486
mountPath: /host/dev
65-
command: ["/usr/local/bin/gpu-mockctl"]
87+
{{- if .Values.cdi.enabled }}
88+
- name: cdi-spec
89+
mountPath: /config
90+
readOnly: true
91+
{{- end }}
92+
command: ["/usr/local/bin/entrypoint.sh"]
6693
args:
94+
- "/usr/local/bin/gpu-mockctl"
6795
- "driver"
6896
- "--driver-root"
6997
- "/host/var/lib/nvidia-mock/driver"

deployments/devel/gpu-mock/helm/gpu-mock/values.yaml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,16 @@ mockDriver:
1515
tag: dev
1616
pullPolicy: IfNotPresent
1717

18-
# Number of mock GPUs to create
18+
# Number of mock GPUs to create (used in default mode)
1919
gpuCount: 8
2020

21-
# Mock GPU model
21+
# Mock GPU model (used in default mode)
2222
gpuModel: "NVIDIA A100-SXM4-40GB"
2323

24+
# Mock GPU architecture (used in default mode)
25+
# Options: dgxa100 (default), h100, h200, b200 (when available)
26+
architecture: dgxa100
27+
2428
# Resources for the mock driver pod
2529
resources:
2630
requests:
@@ -97,6 +101,23 @@ nodeLabeling:
97101
# Apply feature.node.kubernetes.io/pci-10de.present=true label
98102
pciPresent: true
99103

104+
# CDI (Container Device Interface) configuration
105+
cdi:
106+
# Enable CDI spec mode (if disabled, uses default dgxa100 mode)
107+
enabled: false
108+
109+
# Option 1: Reference existing ConfigMap with CDI spec
110+
# The ConfigMap should have a key "spec.yaml" containing the CDI spec
111+
configMapName: ""
112+
113+
# Option 2: Inline CDI spec (embedded in Helm values)
114+
# Leave empty to use ConfigMap or default mode
115+
inlineSpec: ""
116+
117+
# Override architecture detection from CDI spec
118+
# Leave empty to auto-detect from CDI spec annotations
119+
architectureOverride: ""
120+
100121
# Development/debugging options
101122
debug:
102123
# Enable debug logging

0 commit comments

Comments
 (0)