You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/life-cycle-management.rst
+20-14Lines changed: 20 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -347,8 +347,8 @@ The status upgrade of each node is reflected in its nvidia.com/ofed-driver-upgra
347
347
- Set when DOCA Driver POD is up-to-date and running on the node, the node is schedulable.
348
348
* - ``upgrade-required``
349
349
- Set when DOCA Driver POD on the node is not up-to-date and requires upgrade. No actions are performed at this stage.
350
-
* - ``node-maintenance-required``
351
-
- Set when requestor mode upgrade is used (e.g. ``MAINTENANCE_OPERATOR_ENABLED=true``) post ``upgrade-required`` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
350
+
* - ``node-maintenance-required``
351
+
- Set when requestor mode upgrade is used, e.g. `MAINTENANCE_OPERATOR_ENABLED=true`, post `upgrade-required` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
352
352
* - ``cordon-required``
353
353
- Set when the node needs to be made unschedulable in preparation for driver upgrade.
354
354
* - ``wait-for-jobs-required``
@@ -396,27 +396,33 @@ DOCA Driver upgrade supports the following modes:
396
396
397
397
.. list-table::
398
398
:header-rows: 1
399
+
399
400
* - Mode
400
401
- Description
401
402
* - In-place
402
403
- In-place (legacy) mode is incorporating full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
403
404
* - Requestor
404
405
- New ``requestor`` upgrade mode uses NVIDIA maintenance operator (please refer to `maintenance-operator repo`_) nodeMaintenance k8s API objects, to initiate the DOCA driver upgrade process. Essentially, it will retire current upgrade controller (in-place mode) from performing the following node operations: cordon, wait for pods completion, drain, uncordon. To enable requestor mode, the following environment variable should be enabled ``MAINTENANCE_OPERATOR_ENABLED=true``.
405
406
406
-
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster. Also this can be done through Network Operator helm ``values.yaml``:
407
+
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster.
408
+
By default, upgrade controller will use in-place mode.
409
+
``nodeMaintenanceNamePrefix`` is used to distinguish between different (operators) requestors, requesting node maintenance operations on the same node(s).
410
+
Deploying maintenance operator, as well as enabling reuestor mode, can be done through Network Operator helm ``values.yaml``:
0 commit comments