Skip to content

Commit 6355ce8

Browse files
committed
chore: adding DOCA driver upgrade modes: inplace/requestor
Signed-off-by: Ido Heyvi <[email protected]>
1 parent 842c76c commit 6355ce8

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

docs/life-cycle-management.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,8 @@ The status upgrade of each node is reflected in its nvidia.com/ofed-driver-upgra
347347
- Set when DOCA Driver POD is up-to-date and running on the node, the node is schedulable.
348348
* - ``upgrade-required``
349349
- Set when DOCA Driver POD on the node is not up-to-date and requires upgrade. No actions are performed at this stage.
350+
* - ``node-maintenance-required``
351+
- Set when requestor mode upgrade is used (e.g. ``MAINTENANCE_OPERATOR_ENABLED=true``) post ``upgrade-required`` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
350352
* - ``cordon-required``
351353
- Set when the node needs to be made unschedulable in preparation for driver upgrade.
352354
* - ``wait-for-jobs-required``
@@ -384,6 +386,38 @@ The status upgrade of each node is reflected in its nvidia.com/ofed-driver-upgra
384386
deleteEmptyDir: true
385387
podSelector: ""
386388
389+
#############
390+
Upgrade modes
391+
#############
392+
393+
.. _maintenance-operator repo: https://github.com/Mellanox/maintenance-operator
394+
395+
DOCA Driver upgrade supports the following modes:
396+
397+
.. list-table::
398+
:header-rows: 1
399+
* - Mode
400+
- Description
401+
* - In-place
402+
- In-place (legacy) mode is incorporating full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
403+
* - Requestor
404+
- New ``requestor`` upgrade mode uses NVIDIA maintenance operator (please refer to `maintenance-operator repo`_) nodeMaintenance k8s API objects, to initiate the DOCA driver upgrade process. Essentially, it will retire current upgrade controller (in-place mode) from performing the following node operations: cordon, wait for pods completion, drain, uncordon. To enable requestor mode, the following environment variable should be enabled ``MAINTENANCE_OPERATOR_ENABLED=true``.
405+
406+
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster. Also this can be done through Network Operator helm ``values.yaml``:
407+
.. code-block:: yaml
408+
maintenanceOperator:
409+
enabled: true
410+
maintenance-operator-chart:
411+
operatorConfig:
412+
maxParallelOperations: 4
413+
maxUnavailable: 2
414+
operator:
415+
maintenanceOperator:
416+
useRequestor: true
417+
requestorID: "nvidia.network.operator"
418+
nodeMaintenanceNamePrefix: "network-operator"
419+
nodeMaintenanceNamespace: default
420+
387421
###################
388422
Safe Driver Loading
389423
###################

0 commit comments

Comments
 (0)