Skip to content

Commit c89b7d9

Browse files
committed
chore: adding descriptions regarding DOCA driver upgrade controller modes: inplace/requestor
Signed-off-by: Ido Heyvi <[email protected]>
1 parent 6355ce8 commit c89b7d9

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

docs/life-cycle-management.rst

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -347,8 +347,8 @@ The status upgrade of each node is reflected in its nvidia.com/ofed-driver-upgra
347347
- Set when DOCA Driver POD is up-to-date and running on the node, the node is schedulable.
348348
* - ``upgrade-required``
349349
- Set when DOCA Driver POD on the node is not up-to-date and requires upgrade. No actions are performed at this stage.
350-
* - ``node-maintenance-required``
351-
- Set when requestor mode upgrade is used (e.g. ``MAINTENANCE_OPERATOR_ENABLED=true``) post ``upgrade-required`` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
350+
* - ``node-maintenance-required``
351+
- Set when requestor mode upgrade is used, e.g. `MAINTENANCE_OPERATOR_ENABLED=true`, post `upgrade-required` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
352352
* - ``cordon-required``
353353
- Set when the node needs to be made unschedulable in preparation for driver upgrade.
354354
* - ``wait-for-jobs-required``
@@ -396,27 +396,33 @@ DOCA Driver upgrade supports the following modes:
396396

397397
.. list-table::
398398
:header-rows: 1
399+
399400
* - Mode
400401
- Description
401402
* - In-place
402403
- In-place (legacy) mode is incorporating full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
403404
* - Requestor
404405
- New ``requestor`` upgrade mode uses NVIDIA maintenance operator (please refer to `maintenance-operator repo`_) nodeMaintenance k8s API objects, to initiate the DOCA driver upgrade process. Essentially, it will retire current upgrade controller (in-place mode) from performing the following node operations: cordon, wait for pods completion, drain, uncordon. To enable requestor mode, the following environment variable should be enabled ``MAINTENANCE_OPERATOR_ENABLED=true``.
405406

406-
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster. Also this can be done through Network Operator helm ``values.yaml``:
407+
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster.
408+
By default, upgrade controller will use in-place mode.
409+
``nodeMaintenanceNamePrefix`` is used to distinguish between different (operators) requestors, requesting node maintenance operations on the same node(s).
410+
Deploying maintenance operator, as well as enabling reuestor mode, can be done through Network Operator helm ``values.yaml``:
411+
407412
.. code-block:: yaml
413+
414+
maintenanceOperator:
415+
enabled: true
416+
maintenance-operator-chart:
417+
operatorConfig:
418+
maxParallelOperations: 2
419+
maxUnavailable: 2
420+
operator:
408421
maintenanceOperator:
409-
enabled: true
410-
maintenance-operator-chart:
411-
operatorConfig:
412-
maxParallelOperations: 4
413-
maxUnavailable: 2
414-
operator:
415-
maintenanceOperator:
416-
useRequestor: true
417-
requestorID: "nvidia.network.operator"
418-
nodeMaintenanceNamePrefix: "network-operator"
419-
nodeMaintenanceNamespace: default
422+
useRequestor: true
423+
requestorID: "nvidia.network.operator"
424+
nodeMaintenanceNamePrefix: "network-operator"
425+
nodeMaintenanceNamespace: default
420426
421427
###################
422428
Safe Driver Loading

0 commit comments

Comments
 (0)