Skip to content

Commit 9da3a63

Browse files
committed
chore: adding descriptions regarding DOCA driver upgrade controller modes: inplace/requestor
Signed-off-by: Ido Heyvi <[email protected]>
1 parent 6355ce8 commit 9da3a63

File tree

1 file changed

+22
-15
lines changed

1 file changed

+22
-15
lines changed

docs/life-cycle-management.rst

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -347,8 +347,8 @@ The status upgrade of each node is reflected in its nvidia.com/ofed-driver-upgra
347347
- Set when DOCA Driver POD is up-to-date and running on the node, the node is schedulable.
348348
* - ``upgrade-required``
349349
- Set when DOCA Driver POD on the node is not up-to-date and requires upgrade. No actions are performed at this stage.
350-
* - ``node-maintenance-required``
351-
- Set when requestor mode upgrade is used (e.g. ``MAINTENANCE_OPERATOR_ENABLED=true``) post ``upgrade-required`` state. Essentially it will create a matching nodeMaintenance object for maintenance operator to perform its node operations.
350+
* - ``node-maintenance-required``
351+
- Set when requestor mode upgrade is used, e.g. `MAINTENANCE_OPERATOR_ENABLED=true`, post `upgrade-required` state. Essentially it will create a matching nodeMaintenance object for dedicated node(s), utilizing maintenance operator to perform its node operations.
352352
* - ``cordon-required``
353353
- Set when the node needs to be made unschedulable in preparation for driver upgrade.
354354
* - ``wait-for-jobs-required``
@@ -396,27 +396,34 @@ DOCA Driver upgrade supports the following modes:
396396

397397
.. list-table::
398398
:header-rows: 1
399+
399400
* - Mode
400401
- Description
401402
* - In-place
402-
- In-place (legacy) mode is incorporating full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
403+
- In-place (legacy) mode is incorporates full driver upgrade lifecycle, including nodes operations e.g. cordon, pod eviction, drain, uncordon. It also maintains an internal scheduler for performing above node operations, according to provided ``maxParallelUpgrades`` under ``UpgradePolicy``.
403404
* - Requestor
404405
- New ``requestor`` upgrade mode uses NVIDIA maintenance operator (please refer to `maintenance-operator repo`_) nodeMaintenance k8s API objects, to initiate the DOCA driver upgrade process. Essentially, it will retire current upgrade controller (in-place mode) from performing the following node operations: cordon, wait for pods completion, drain, uncordon. To enable requestor mode, the following environment variable should be enabled ``MAINTENANCE_OPERATOR_ENABLED=true``.
405406

406-
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster. Also this can be done through Network Operator helm ``values.yaml``:
407+
.. note:: Enabling requestor mode will require deployment of NVIDIA maintenance operator on the cluster.
408+
By default, upgrade controller will use in-place mode.
409+
``nodeMaintenanceNamePrefix`` is used to distinguish between different (operators) requestors, requesting node maintenance operations on the same node(s).
410+
Deploying maintenance operator, as well as enabling requestor mode, setting requestors env variables ``MAINTENANCE_OPERATOR_REQUESTOR_ID``, ``MAINTENANCE_OPERATOR_REQUESTOR_NAMESPACE``, ``MAINTENANCE_OPERATOR_NODE_MAINTENANCE_PREFIX``,
411+
can be done through Network Operator helm ``values.yaml``:
412+
407413
.. code-block:: yaml
414+
415+
maintenanceOperator:
416+
enabled: true
417+
maintenance-operator-chart:
418+
operatorConfig:
419+
maxParallelOperations: 2
420+
maxUnavailable: 2
421+
operator:
408422
maintenanceOperator:
409-
enabled: true
410-
maintenance-operator-chart:
411-
operatorConfig:
412-
maxParallelOperations: 4
413-
maxUnavailable: 2
414-
operator:
415-
maintenanceOperator:
416-
useRequestor: true
417-
requestorID: "nvidia.network.operator"
418-
nodeMaintenanceNamePrefix: "network-operator"
419-
nodeMaintenanceNamespace: default
423+
useRequestor: true
424+
requestorID: "nvidia.network.operator"
425+
nodeMaintenanceNamePrefix: "network-operator"
426+
nodeMaintenanceNamespace: default
420427
421428
###################
422429
Safe Driver Loading

0 commit comments

Comments
 (0)