Skip to content

Commit 1e83a87

Browse files
deployments: Add CDI spec examples
Add example CDI specifications for testing and reference. Files: - cdi-spec-a100-2gpu.yaml: Minimal 2-GPU configuration - Lightweight setup for development/testing - Includes environment variable examples - Demonstrates basic CDI structure - cdi-spec-a100-8gpu.yaml: DGX A100 simulation - 8x NVIDIA A100-SXM4-40GB GPUs - Matches default mode topology - Shows complete device enumeration - Includes usage instructions in comments Both specs follow CDI v0.5.0 specification and include: - Device nodes (/dev/nvidia0-7, nvidiactl, nvidia-uvm*) - GPU model annotations for architecture detection - UUID and index metadata - Proper device major/minor numbers Usage: # From file: kubectl create configmap my-spec \ --from-file=spec.yaml=cdi-spec-a100-2gpu.yaml helm install gpu-mock ./helm/gpu-mock \ --set cdi.enabled=true --set cdi.configMapName=my-spec # Inline: helm install gpu-mock ./helm/gpu-mock \ --set cdi.enabled=true \ --set-file cdi.inlineSpec=cdi-spec-a100-8gpu.yaml These examples serve as templates for creating custom GPU topologies.
1 parent 8db72b7 commit 1e83a87

File tree

3 files changed

+174
-178
lines changed

3 files changed

+174
-178
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Example CDI Specification for 2x NVIDIA A100 GPUs
2+
# Simpler configuration for testing and development
3+
# Compatible with CDI specification version 0.5.0
4+
#
5+
# Usage:
6+
# kubectl create configmap gpu-cdi-spec --from-file=spec.yaml=cdi-spec-a100-2gpu.yaml -n gpu-mock
7+
# helm upgrade --install gpu-mock ../../helm/gpu-mock \
8+
# --set cdi.enabled=true \
9+
# --set cdi.configMapName=gpu-cdi-spec \
10+
# --namespace gpu-mock
11+
12+
cdiVersion: "0.5.0"
13+
kind: nvidia.com/gpu
14+
15+
devices:
16+
# GPU 0
17+
- name: "gpu0"
18+
annotations:
19+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
20+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000000"
21+
nvidia.com/gpu.index: "0"
22+
containerEdits:
23+
deviceNodes:
24+
- path: /dev/nvidia0
25+
type: c
26+
major: 195
27+
minor: 0
28+
fileMode: 0666
29+
env:
30+
- NVIDIA_VISIBLE_DEVICES=0
31+
32+
# GPU 1
33+
- name: "gpu1"
34+
annotations:
35+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
36+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000001"
37+
nvidia.com/gpu.index: "1"
38+
containerEdits:
39+
deviceNodes:
40+
- path: /dev/nvidia1
41+
type: c
42+
major: 195
43+
minor: 1
44+
fileMode: 0666
45+
env:
46+
- NVIDIA_VISIBLE_DEVICES=1
47+
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Example CDI Specification for 8x NVIDIA A100 GPUs
2+
# This mimics the NVIDIA DGX A100 configuration
3+
# Compatible with CDI specification version 0.5.0
4+
#
5+
# Usage:
6+
# kubectl create configmap gpu-cdi-spec --from-file=spec.yaml=cdi-spec-a100-8gpu.yaml -n gpu-mock
7+
# helm upgrade --install gpu-mock ../../helm/gpu-mock \
8+
# --set cdi.enabled=true \
9+
# --set cdi.configMapName=gpu-cdi-spec \
10+
# --namespace gpu-mock
11+
12+
cdiVersion: "0.5.0"
13+
kind: nvidia.com/gpu
14+
15+
devices:
16+
# GPU 0
17+
- name: "gpu0"
18+
annotations:
19+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
20+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000000"
21+
nvidia.com/gpu.index: "0"
22+
containerEdits:
23+
deviceNodes:
24+
- path: /dev/nvidia0
25+
type: c
26+
major: 195
27+
minor: 0
28+
fileMode: 0666
29+
30+
# GPU 1
31+
- name: "gpu1"
32+
annotations:
33+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
34+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000001"
35+
nvidia.com/gpu.index: "1"
36+
containerEdits:
37+
deviceNodes:
38+
- path: /dev/nvidia1
39+
type: c
40+
major: 195
41+
minor: 1
42+
fileMode: 0666
43+
44+
# GPU 2
45+
- name: "gpu2"
46+
annotations:
47+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
48+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000002"
49+
nvidia.com/gpu.index: "2"
50+
containerEdits:
51+
deviceNodes:
52+
- path: /dev/nvidia2
53+
type: c
54+
major: 195
55+
minor: 2
56+
fileMode: 0666
57+
58+
# GPU 3
59+
- name: "gpu3"
60+
annotations:
61+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
62+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000003"
63+
nvidia.com/gpu.index: "3"
64+
containerEdits:
65+
deviceNodes:
66+
- path: /dev/nvidia3
67+
type: c
68+
major: 195
69+
minor: 3
70+
fileMode: 0666
71+
72+
# GPU 4
73+
- name: "gpu4"
74+
annotations:
75+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
76+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000004"
77+
nvidia.com/gpu.index: "4"
78+
containerEdits:
79+
deviceNodes:
80+
- path: /dev/nvidia4
81+
type: c
82+
major: 195
83+
minor: 4
84+
fileMode: 0666
85+
86+
# GPU 5
87+
- name: "gpu5"
88+
annotations:
89+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
90+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000005"
91+
nvidia.com/gpu.index: "5"
92+
containerEdits:
93+
deviceNodes:
94+
- path: /dev/nvidia5
95+
type: c
96+
major: 195
97+
minor: 5
98+
fileMode: 0666
99+
100+
# GPU 6
101+
- name: "gpu6"
102+
annotations:
103+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
104+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000006"
105+
nvidia.com/gpu.index: "6"
106+
containerEdits:
107+
deviceNodes:
108+
- path: /dev/nvidia6
109+
type: c
110+
major: 195
111+
minor: 6
112+
fileMode: 0666
113+
114+
# GPU 7
115+
- name: "gpu7"
116+
annotations:
117+
nvidia.com/gpu.model: "NVIDIA A100-SXM4-40GB"
118+
nvidia.com/gpu.uuid: "GPU-00000000-0000-0000-0000-000000000007"
119+
nvidia.com/gpu.index: "7"
120+
containerEdits:
121+
deviceNodes:
122+
- path: /dev/nvidia7
123+
type: c
124+
major: 195
125+
minor: 7
126+
fileMode: 0666
127+

pkg/gpu/mocknvml/NVML_SYMBOLS_ANALYSIS.md

Lines changed: 0 additions & 178 deletions
This file was deleted.

0 commit comments

Comments
 (0)