Skip to content

Commit 1f3edad

Browse files
authored
add NFD rule to QAT resource driver (#66)
* add NFD rule to QAT resource driver Signed-off-by: Oksana Baranova <[email protected]>
1 parent 5f43b0d commit 1f3edad

File tree

4 files changed

+70
-3
lines changed

4 files changed

+70
-3
lines changed

charts/intel-qat-resource-driver/Chart.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,11 @@ description: A Helm chart for a Dynamic Resource Allocation (DRA) Intel QAT Reso
55
type: application
66
version: 0.1.0
77
appVersion: "v0.1.0"
8+
home: https://github.com/intel/helm-charts
9+
10+
dependencies:
11+
- name: node-feature-discovery
12+
alias: nfd
13+
version: "0.16.6"
14+
condition: nfd.enabled
15+
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts

charts/intel-qat-resource-driver/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ helm repo update
1616
You can execute `helm search repo intel` command to see pulled charts [optional].
1717

1818
## Install Helm Chart
19+
When installing, update the dependencies:
1920
```
21+
helm dependency update
2022
helm install intel-qat-resource-driver intel/intel-qat-resource-driver
2123
```
2224
## Upgrade Chart
@@ -46,3 +48,21 @@ You may also run `helm show values` on this chart's dependencies for additional
4648
| image.tag | string | `"v0.1.0"` |
4749

4850
If you change the image tag to be used in Helm chart deployment, ensure that the version of the container image is consistent with deployment YAMLs - they might change between releases.
51+
52+
53+
## Read-only file system error for QAT
54+
55+
When the following error appears in the logs of the QAT Kubelet plugin:
56+
```
57+
kubectl logs -n intel-qat-resource-driver intel-qat-resource-driver-kubelet-plugin-ttcs6
58+
DRA kubelet plugin
59+
In-cluster config
60+
Setting up CDI
61+
failed to create kubelet plugin driver: cannot enable PF device '0000:6b:00.0': open /sysfs/bus/pci/devices/0000:6b:00.0/sriov_numvfs: read-only file system
62+
```
63+
64+
Try reseting QAT by reloading its kernel driver:
65+
```
66+
rmmod qat_4xxx
67+
modprobe qat_4xxx
68+
```
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
apiVersion: nfd.k8s-sigs.io/v1alpha1
2+
kind: NodeFeatureRule
3+
metadata:
4+
name: intel-qat-device-rule
5+
spec:
6+
rules:
7+
- name: "intel.qat"
8+
labels:
9+
feature.node.kubernetes.io/qat: "true"
10+
matchFeatures:
11+
- feature: pci.device
12+
matchExpressions:
13+
vendor: {op: In, value: ["8086"]}
14+
device: {op: In, value: ["4940", "4941", "4944", "4946"]}
15+
class: {op: In, value: ["0b40"]}
16+
- feature: kernel.loadedmodule
17+
matchExpressions:
18+
intel_qat: {op: Exists}
19+
matchAny:
20+
- matchFeatures:
21+
- feature: kernel.loadedmodule
22+
matchExpressions:
23+
vfio_pci: {op: Exists}
24+
- matchFeatures:
25+
- feature: kernel.enabledmodule
26+
matchExpressions:
27+
vfio-pci: {op: Exists}

charts/intel-qat-resource-driver/values.yaml

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,26 @@ serviceAccount:
1919

2020
kubeletPlugin:
2121
podAnnotations: {}
22+
nodeSelector:
23+
feature.node.kubernetes.io/qat: "true"
2224
tolerations:
2325
- key: node-role.kubernetes.io/master
2426
operator: Exists
2527
effect: NoSchedule
2628
- key: node-role.kubernetes.io/control-plane
2729
operator: Exists
2830
effect: NoSchedule
29-
nodeSelector:
30-
{}
31-
#node-role.kubernetes.io/control-plane: ""
31+
# Refer to the official documentation for Node Feature Discovery (NFD)
32+
# regarding node tainting:
33+
# https://nfd.sigs.k8s.io/usage/customization-guide#node-tainting
34+
- key: "node.kubernetes.io/qat"
35+
operator: "Exists"
36+
effect: "NoSchedule"
3237
affinity: {}
38+
39+
nfd:
40+
enabled: false # change to true to install NFD to the cluster
41+
nameOverride: intel-qat-nfd
42+
# TODO: this deprecated NFD option will be replaced in NFD v0.17 with "featureGates.NodeFeatureAPI" (added in v0.16):
43+
# https://kubernetes-sigs.github.io/node-feature-discovery/v0.16/deployment/helm.html#general-parameters
44+
enableNodeFeatureApi: true

0 commit comments

Comments
 (0)