You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cd hack/release && go run release.go --releaseDefaults $(CURDIR)/build/release.yaml --templateDir ./templates/vars --outputDir ../../docs/common/
216
+
cd hack/release && go run release.go --releaseDefaults $(CURDIR)/build/release.yaml --releaseVersions $(CURDIR)/hack/release/versions.txt --templateDir ./templates/vars --outputDir ../../docs/common/
217
217
cd hack/release && go run release.go --with-sha256 --releaseDefaults $(CURDIR)/build/release.yaml --templateDir ./templates/image-sha256 --outputDir ../../docs/advanced/
The network operator provides limited upgrade capabilities, which require additional manual actions if a containerized DOCA-OFED Driver is used. Future releases of the network operator will provide an automatic upgrade flow for the containerized driver.
127
-
128
-
Since Helm does not support auto-upgrade of existing CRDs, the user must follow a two-step process to upgrade the network-operator release:
129
-
130
-
* Upgrade the CRD to the latest version
131
-
* Apply Helm chart update
132
-
133
118
----------------------------
134
119
Downloading a New Helm Chart
135
120
----------------------------
@@ -143,129 +128,37 @@ To obtain new releases, run:
It is possible to retrieve updated CRDs from the Helm chart or from the release branch on GitHub. The example below shows how to upgrade CRDs from the downloaded chart.
Edit the values-<VERSION>.yaml file as required for your cluster. The network operator has some limitations as to which updates in the NicClusterPolicy it can handle automatically. If the configuration for the new release is different from the current configuration in the deployed release, some additional manual actions may be required.
164
-
165
-
Known limitations:
166
-
167
-
* If component configuration was removed from the NicClusterPolicy, manual clean up of the component's resources (DaemonSets, ConfigMaps, etc.) may be required.
168
-
* If the configuration for devicePlugin changed without image upgrade, manual restart of the devicePlugin may be required.
169
-
170
-
These limitations will be addressed in future releases.
171
-
172
-
.. warning:: Changes that were made directly in the NicClusterPolicy CR (e.g. with kubectl edit) will be overwritten by the Helm upgrade due to the `force` flag.
173
-
174
-
------------------------------
175
-
Applying the Helm Chart Update
176
-
------------------------------
135
+
Edit the `values-<VERSION>.yaml` file as required for your cluster.
.. warning:: This operation is required only if containerized DOCA-OFED Driver is in use.
195
-
196
-
When a containerized DOCA-OFED Driver is reloaded on the node, all pods that use a secondary network based on NVIDIA NICs will lose network interface in their containers. To prevent outage, remove all pods that use a secondary network from the node before you reload the driver pod on it.
197
-
198
-
The Helm upgrade command will only upgrade the DaemonSet spec of the DOCA-OFED Driver to point to the new driver version. The DOCA-OFED Driver's DaemonSet will not automatically restart pods with the driver on the nodes, as it uses "OnDelete" updateStrategy. The old DOCA-OFED Driver version will still run on the node until you explicitly remove the driver pod or reboot the node:
151
+
.. note::
152
+
153
+
The network operator has some limitations as to which updates in the NicClusterPolicy it can handle automatically. If the configuration for the new release is different from the current configuration in the deployed release, some additional manual actions may be required.
199
154
200
-
.. code-block:: bash
155
+
Known limitations:
156
+
157
+
* If the configuration for devicePlugin changed without image upgrade, manual restart of the devicePlugin may be required.
201
158
202
-
$ kubectl delete pod -l app=mofed-<OS_NAME> -n nvidia-network-operator
159
+
These limitations will be addressed in future releases.
203
160
204
-
It is possible to remove all pods with secondary networks from all cluster nodes, and then restart the DOCA-OFED Driver pods on all nodes at once.
205
-
206
-
The alternative option is to perform an upgrade in a rolling manner to reduce the impact of the driver upgrade on the cluster. The driver pod restart can be done on each node individually. In this case, pods with secondary networks should be removed from the single node only. There is no need to stop pods on all nodes.
207
-
208
-
For each node, follow these steps to reload the driver on the node:
209
-
210
-
1. Remove pods with a secondary network from the node.
211
-
2. Restart the DOCA-OFED Driver pod.
212
-
3. Return the pods with a secondary network to the node.
213
-
214
-
When the DOCA-OFED Driver is ready, proceed with the same steps for other nodes.
Update the components version in the NicClusterPolicy. Refer to the :ref:`NicClusterPolicy CRD Full Example <ncp-cr-example>` for more details and latest version of the components.
269
162
270
163
----------------------------------
271
164
Automatic DOCA-OFED Driver Upgrade
@@ -318,7 +211,7 @@ To enable automatic DOCA-OFED Driver upgrade, define the UpgradePolicy section f
318
211
# specify if should continue even if there are pods using emptyDir
319
212
deleteEmptyDir: false
320
213
321
-
Apply NicClusterPolicy CRD:
214
+
Apply NicClusterPolicy CR:
322
215
323
216
.. code-block:: bash
324
217
@@ -457,6 +350,92 @@ Troubleshooting
457
350
- Manually delete the pod by using ``kubectl delete -n <Network Operator Namespace> <pod name>``.
458
351
If following the restart the pod still fails, change the NVIDIA DOCA-OFED Driver version in the NicClusterPolicy to the previous version or to another working version.
459
352
353
+
-------------------------------
354
+
DOCA-OFED Driver Manual Upgrade
355
+
-------------------------------
356
+
357
+
Automatic DOCA-OFED Driver upgrade is the preferred method for upgrading the DOCA-OFED Driver. However, if you need to manually upgrade the DOCA-OFED Driver, you can follow the steps below.
.. warning:: This operation is required only if containerized DOCA-OFED Driver is in use.
364
+
365
+
When a containerized DOCA-OFED Driver is reloaded on the node, all pods that use a secondary network based on NVIDIA NICs will lose network interface in their containers. To prevent outage, remove all pods that use a secondary network from the node before you reload the driver pod on it.
366
+
367
+
The Helm upgrade command will only upgrade the DaemonSet spec of the DOCA-OFED Driver to point to the new driver version. The DOCA-OFED Driver's DaemonSet will not automatically restart pods with the driver on the nodes, as it uses "OnDelete" updateStrategy. The old DOCA-OFED Driver version will still run on the node until you explicitly remove the driver pod or reboot the node:
368
+
369
+
.. code-block:: bash
370
+
371
+
$ kubectl delete pod -l app=mofed-<OS_NAME> -n nvidia-network-operator
372
+
373
+
It is possible to remove all pods with secondary networks from all cluster nodes, and then restart the DOCA-OFED Driver pods on all nodes at once.
374
+
375
+
The alternative option is to perform an upgrade in a rolling manner to reduce the impact of the driver upgrade on the cluster. The driver pod restart can be done on each node individually. In this case, pods with secondary networks should be removed from the single node only. There is no need to stop pods on all nodes.
376
+
377
+
For each node, follow these steps to reload the driver on the node:
378
+
379
+
1. Remove pods with a secondary network from the node.
380
+
2. Restart the DOCA-OFED Driver pod.
381
+
3. Return the pods with a secondary network to the node.
382
+
383
+
When the DOCA-OFED Driver is ready, proceed with the same steps for other nodes.
0 commit comments