Skip to content

Commit 5326261

Browse files
authored
Enable NFD rule for GPU resource driver Helm chart (#68)
* add gpu nfd rule
1 parent 0020d7e commit 5326261

File tree

7 files changed

+137
-7
lines changed

7 files changed

+137
-7
lines changed

charts/intel-gpu-resource-driver/Chart.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,13 @@ name: intel-gpu-resource-driver
33
description: A Helm chart for a Dynamic Resource Allocation (DRA) Intel GPU Resource Driver
44

55
type: application
6-
version: 0.6.0
7-
appVersion: "v0.6.0"
6+
version: 0.7.0
7+
appVersion: "v0.7.0"
8+
home: https://github.com/intel/helm-charts
9+
10+
dependencies:
11+
- name: node-feature-discovery
12+
alias: nfd
13+
version: "0.16.6"
14+
condition: nfd.enabled
15+
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts

charts/intel-gpu-resource-driver/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ helm repo update
1616
You can execute `helm search repo intel` command to see pulled charts [optional].
1717

1818
## Install Helm Chart
19+
When installing, update the dependencies:
1920
```
21+
helm dependency update
2022
helm install intel-gpu-resource-driver intel/intel-gpu-resource-driver
2123
```
2224
## Upgrade Chart
@@ -43,7 +45,7 @@ You may also run `helm show values` on this chart's dependencies for additional
4345
| image.repository | string | `intel` |
4446
| image.name | string | `"intel-gpu-resource-driver"` |
4547
| image.pullPolicy | string | `"IfNotPresent"` |
46-
| image.tag | string | `"v0.6.0"` |
48+
| image.tag | string | `"v0.7.0"` |
4749

4850
> [!Note]
4951
> When upgrading, CRDs from previous version need to be removed manually because Helm supports neither upgrading nor deleting CRDs, see: https://github.com/helm/community/blob/main/hips/hip-0011.md

charts/intel-gpu-resource-driver/templates/device-class.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1alpha3
1+
apiVersion: resource.k8s.io/v1beta1
22
kind: DeviceClass
33
metadata:
44
name: gpu.intel.com
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
{{- if .Values.nfd.enabled }}
2+
apiVersion: nfd.k8s-sigs.io/v1alpha1
3+
kind: NodeFeatureRule
4+
metadata:
5+
name: intel-gpu-device-rule
6+
spec:
7+
rules:
8+
- name: "intel.gpu"
9+
labels:
10+
"intel.feature.node.kubernetes.io/gpu": "true"
11+
matchFeatures:
12+
- feature: pci.device
13+
matchExpressions:
14+
vendor: {op: In, value: ["8086"]}
15+
class: {op: In, value: ["0300", "0380"]}
16+
matchAny:
17+
- matchFeatures:
18+
- feature: kernel.loadedmodule
19+
matchExpressions:
20+
i915: {op: Exists}
21+
- matchFeatures:
22+
- feature: kernel.enabledmodule
23+
matchExpressions:
24+
i915: {op: Exists}
25+
---
26+
apiVersion: nfd.k8s-sigs.io/v1alpha1
27+
kind: NodeFeatureRule
28+
metadata:
29+
name: intel-gpu-platform-labeling
30+
spec:
31+
rules:
32+
# A_Series (Alchemist)
33+
- labels:
34+
gpu.intel.com/family: "A_Series"
35+
matchFeatures:
36+
- feature: pci.device
37+
matchExpressions:
38+
class: {op: In, value: ["0300"]}
39+
vendor: {op: In, value: ["8086"]}
40+
device:
41+
op: In
42+
value:
43+
- "56a6"
44+
- "56a5"
45+
- "56a1"
46+
- "56a0"
47+
- "5694"
48+
- "5693"
49+
- "5692"
50+
- "5691"
51+
- "5690"
52+
- "56b3"
53+
- "56b2"
54+
- "56a4"
55+
- "56a3"
56+
- "5697"
57+
- "5696"
58+
- "5695"
59+
- "56b1"
60+
- "56b0"
61+
name: intel.gpu.a.series
62+
# Max_Series
63+
- labels:
64+
gpu.intel.com/family: "Max_Series"
65+
matchFeatures:
66+
- feature: pci.device
67+
matchExpressions:
68+
class: {op: In, value: ["0380"]}
69+
vendor: {op: In, value: ["8086"]}
70+
device:
71+
op: In
72+
value:
73+
- "0bda"
74+
- "0bd5"
75+
- "0bd9"
76+
- "0bdb"
77+
- "0bd7"
78+
- "0bd6"
79+
- "0bd0"
80+
name: intel.gpu.max.series
81+
# Flex_Series
82+
- labels:
83+
gpu.intel.com/family: "Flex_Series"
84+
matchFeatures:
85+
- feature: pci.device
86+
matchExpressions:
87+
class: {op: In, value: ["0300", "0380"]}
88+
vendor: {op: In, value: ["8086"]}
89+
device:
90+
op: In
91+
value:
92+
- "0f00"
93+
- "0f01"
94+
- "0f02"
95+
name: intel.gpu.flex.series
96+
{{- end }}

charts/intel-gpu-resource-driver/templates/resource-driver.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,15 @@ spec:
7373
tolerations:
7474
{{- toYaml . | nindent 8 }}
7575
{{- end }}
76+
{{- if .Values.nfd.enabled }}
77+
nodeSelector:
78+
intel.feature.node.kubernetes.io/gpu: "true"
79+
{{- else }}
7680
{{- with .Values.kubeletPlugin.nodeSelector }}
7781
nodeSelector:
7882
{{- toYaml . | nindent 8 }}
7983
{{- end }}
84+
{{- end }}
8085
{{- with .Values.kubeletPlugin.affinity }}
8186
affinity:
8287
{{- toYaml . | nindent 8 }}

charts/intel-gpu-resource-driver/templates/validating-admission-policy.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ spec:
77
matchConstraints:
88
resourceRules:
99
- apiGroups: ["resource.k8s.io"]
10-
apiVersions: ["v1alpha3"]
10+
apiVersions: ["v1beta1"]
1111
operations: ["CREATE", "UPDATE", "DELETE"]
1212
resources: ["resourceslices"]
1313
matchConditions:

charts/intel-gpu-resource-driver/values.yaml

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ image:
99
repository: intel
1010
name: intel-gpu-resource-driver
1111
pullPolicy: IfNotPresent
12-
tag: "v0.6.0"
12+
tag: "v0.7.0"
1313

1414
serviceAccount:
1515
create: true
@@ -19,6 +19,25 @@ serviceAccount:
1919

2020
kubeletPlugin:
2121
podAnnotations: {}
22-
tolerations: []
2322
nodeSelector: {}
23+
# label used when nfd.enabled is true
24+
#intel.feature.node.kubernetes.io/gpu: "true"
25+
tolerations:
26+
- key: node-role.kubernetes.io/master
27+
operator: Exists
28+
effect: NoSchedule
29+
- key: node-role.kubernetes.io/control-plane
30+
operator: Exists
31+
effect: NoSchedule
32+
# Refer to the official documentation for Node Feature Discovery (NFD)
33+
# regarding node tainting:
34+
# https://nfd.sigs.k8s.io/usage/customization-guide#node-tainting
35+
- key: "node.kubernetes.io/gpu"
36+
operator: "Exists"
37+
effect: "NoSchedule"
2438
affinity: {}
39+
40+
nfd:
41+
enabled: false # change to true to install NFD to the cluster
42+
nameOverride: intel-gpu-nfd
43+
enableNodeFeatureApi: true

0 commit comments

Comments
 (0)