Skip to content

Commit ebbdd43

Browse files
Merge pull request #247 from almaslennikov/fw-upgrade-fixes
rearrange NIC Configuration Operator docs for better UX
2 parents dffcd22 + 37c601b commit ebbdd43

File tree

2 files changed

+98
-36
lines changed

2 files changed

+98
-36
lines changed

docs/nic-conf-operator/nic-configuration-operator.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ NIC Configuration Operator
2121
**************************
2222

2323
.. toctree::
24+
:maxdepth: 1
25+
:titlesonly:
26+
2427
NIC Firmware Configuration <nic-fw-configuration.rst>
2528
Configuration Details <configuration-details.rst>
2629
CRD API Reference <crds.rst>

docs/nic-conf-operator/nic-fw-configuration.rst

Lines changed: 95 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,15 @@ NIC Firmware Configuration
2727
:local:
2828
:backlinks: none
2929

30-
31-
===========================================================================
32-
Configure NIC Firmware using the NIC Configuration Operator
33-
===========================================================================
3430
`NVIDIA NIC Configuration Operator <https://github.com/Mellanox/nic-configuration-operator>`_ provides Kubernetes API (Custom Resource Definition) to allow Firmware update and configuration on NVIDIA NICs in a coordinated manner. It deploys a configuration daemon on each of the desired nodes to configure NVIDIA NICs there. NVIDIA NIC Configuration Operator uses `Maintenance Operator <https://github.com/Mellanox/maintenance-operator>`_ to prepare a node for maintenance before the actual configuration.
35-
3631
.. warning:: NVIDIA NIC Configuration Operator does not support FW reset flow for DPU mode. Check :doc:`limitations <../release-notes>`.
3732

33+
For more information about the CRD API, refer to :doc:`CRD API Reference <crds>`.
34+
35+
=============================================================================
36+
Install the NIC Configuration Operator and observe NIC devices in the cluster
37+
=============================================================================
38+
3839
.. note::
3940
To perform Firmware validation and update on NIC devices, NIC Configuration Operator requires a persistent storage set up in the cluster.
4041
To set up a persistent NFS storage in the cluster, the `example from the CSI NFS Driver repository <https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/example/nfs-provisioner/README.md>`_ might be used.
@@ -136,7 +137,14 @@ Discover more information about a specific device:
136137
serialNumber: mt1952x03327
137138
type: 101d
138139
139-
Configure and apply the NICFirmwareSource CR:
140+
========================================================
141+
Update NIC Firmware using the NIC Configuration Operator
142+
========================================================
143+
--------------------------------------------
144+
Configure and apply the NICFirmwareSource CR
145+
--------------------------------------------
146+
147+
Deploy the NICFirmwareSource CR:
140148

141149
.. code-block:: yaml
142150
@@ -148,9 +156,20 @@ Configure and apply the NICFirmwareSource CR:
148156
finalizers:
149157
- configuration.net.nvidia.com/nic-configuration-operator
150158
spec:
151-
# a list of firmware binaries zip archives from the Mellanox website, can point to any url accessible from the cluster
159+
# a list of firmware binaries zip archives from the Mellanox website, can point to any URL accessible from the cluster
152160
binUrlSources:
153161
- https://www.mellanox.com/downloads/firmware/fw-ConnectX6Dx-rel-22_44_1036-MCX623106AC-CDA_Ax-UEFI-14.37.14-FlexBoot-3.7.500.signed.bin.zip
162+
# a URL to the BlueField Bundle (BFB) file, can point to any URL accessible from the cluster
163+
bfbUrlSource:
164+
- https://example.com/bf-fwbundle-3.1.0-77_25.07-prod.bfb
165+
166+
.. note::
167+
The ConnectX firmware binaries can be downloaded from the `NVIDIA Networking Firmware Downloads page <https://network.nvidia.com/support/firmware/firmware-downloads/>`_.
168+
The URLs of the firmware binaries from the website can be directly provided in the binUrlSources field of the NicFirmwareSource CR.
169+
170+
.. note::
171+
BlueField Bundle (BFB) can be downloaded from the `NVIDIA DOCA Downloads page <https://developer.nvidia.com/doca-downloads?deployment_platform=BlueField&deployment_package=BF-FW-Bundle&installer_type=BFB>`_.
172+
The file should first be made available in the cluster and then its URL should be provided in the bfbUrlSource field of the NicFirmwareSource CR.
154173

155174
Observe the NICFirmwareSource status:
156175

@@ -165,6 +184,10 @@ Observe the NICFirmwareSource status:
165184
22.44.1036:
166185
- mt_0000000436
167186
187+
----------------------------------------------
188+
Configure and apply the NicFirmwareTemplate CR
189+
----------------------------------------------
190+
168191
Configure and apply the NicFirmwareTemplate CR:
169192

170193
.. code-block:: yaml
@@ -183,7 +206,66 @@ Configure and apply the NicFirmwareTemplate CR:
183206
nicFirmwareSourceRef: connectx6dx-firmware-22-44-1036
184207
updatePolicy: Update
185208
186-
Configure and apply the NicConfigurationTemplate CR:
209+
Spec of the NicDevice CR is updated in accordance with the NICFirmwareTemplate and NicConfigurationTemplate CRs matching the device
210+
211+
.. code-block:: bash
212+
213+
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.spec}' | yq -P
214+
215+
template:
216+
firmware:
217+
nicFirmwareSourceRef: connectx6dx-firmware-22-44-1036
218+
updatePolicy: Update
219+
220+
Status conditions of the NicDevice CR reflect the status of the firmware update and indicate any errors that might occur during the process
221+
222+
.. code-block:: bash
223+
224+
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P
225+
226+
- type: FirmwareUpdateInProgress
227+
status: "False"
228+
reason: DeviceFirmwareConfigMatch
229+
message: Firmware matches the requested version
230+
observedGeneration: 4
231+
lastTransitionTime: "2024-09-21T08:42:23Z"
232+
233+
----------------------------------
234+
NIC Firmware Mismatch Notification
235+
----------------------------------
236+
237+
NIC Configuration Operator updates status conditions of the NicDevice CR to set `FirmwareConfigMatch` condition based on a current NIC firmware:
238+
239+
.. code-block:: bash
240+
241+
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P
242+
243+
- type: FirmwareConfigMatch
244+
status: "True"
245+
reason: DeviceFirmwareConfigMatch
246+
message: Device firmware '20.42.1000' matches to recommended version '20.42.1000'
247+
lastTransitionTime: "2024-09-21T08:43:10Z"
248+
249+
`FirmwareConfigMatch` condition status is set to `Unknown` if DOCA-OFED Driver is not installed otherwise it notifies if current NIC firmware is recommended or not recommended by DOCA-OFED Driver. E.g.:
250+
251+
.. code-block:: bash
252+
253+
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P
254+
255+
- type: FirmwareConfigMatch
256+
status: "True"
257+
reason: DeviceFirmwareConfigMatch
258+
message: Device firmware '20.42.1000' matches to recommended version '20.42.1000'
259+
lastTransitionTime: "2024-11-08T09:19:41Z"
260+
261+
262+
===========================================================
263+
Configure NIC Firmware using the NIC Configuration Operator
264+
===========================================================
265+
266+
---------------------------------------------------
267+
Configure and apply the NicConfigurationTemplate CR
268+
---------------------------------------------------
187269

188270
.. code-block:: yaml
189271
@@ -224,7 +306,7 @@ Configure and apply the NicConfigurationTemplate CR:
224306

225307
.. note:: To use the NIC Configuration Operator functionality together with SR-IOV Network Operator, "mellanox" `plugin should be disabled <https://github.com/k8snetworkplumbingwg/sriov-network-operator/tree/master?tab=readme-ov-file#disabling-sr-iov-config-daemon-plugins>`_ in the SR-IOV Network Operator.
226308

227-
For more information about the CRD API, refer to :doc:`CRD API Reference <crds>`.
309+
228310
For detailed information about firmware parameters and configuration settings, refer to :doc:`Configuration Details <configuration-details>`.
229311

230312
Spec of the NicDevice CR is updated in accordance with the NICFirmwareTemplate and NicConfigurationTemplate CRs matching the device
@@ -251,6 +333,9 @@ Spec of the NicDevice CR is updated in accordance with the NICFirmwareTemplate a
251333
enabled: true
252334
env: Baremetal
253335
336+
----------------------------------------------
337+
Observe the status of the configuration update
338+
----------------------------------------------
254339

255340
Status conditions of the NicDevice CR reflect the status of the configuration update and indicate any errors that might occur during the process
256341

@@ -270,30 +355,4 @@ Status conditions of the NicDevice CR reflect the status of the configuration up
270355
message: ""
271356
lastTransitionTime: "2024-09-21T08:43:08Z"
272357
273-
----------------------------------
274-
NIC Firmware Mismatch Notification
275-
----------------------------------
276-
277-
NIC Configuration Operator updates status conditions of the NicDevice CR to set `FirmwareConfigMatch` condition based on a current NIC firmware:
278-
279-
.. code-block:: bash
280-
281-
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P
282-
283-
- type: FirmwareConfigMatch
284-
status: "True"
285-
reason: DeviceFirmwareConfigMatch
286-
message: Device firmware '20.42.1000' matches to recommended version '20.42.1000'
287-
lastTransitionTime: "2024-09-21T08:43:10Z"
288-
289-
`FirmwareConfigMatch` condition status is set to `Unknown` if DOCA-OFED Driver is not installed otherwise it notifies if current NIC firmware is recommended or not recommended by DOCA-OFED Driver. E.g.:
290-
291-
.. code-block:: bash
292-
293-
> kubectl get nicdevice -n nvidia-network-operator node1-101d-mt1952x03327 -o jsonpath='{.status.conditions}' | yq -P
294-
295-
- type: FirmwareConfigMatch
296-
status: "True"
297-
reason: DeviceFirmwareConfigMatch
298-
message: Device firmware '20.42.1000' matches to recommended version '20.42.1000'
299-
lastTransitionTime: "2024-11-08T09:19:41Z"
358+
.. note:: If both Firmware update and configuration are applied to a single device, the firmware update should be performed first. The configuration update will be applied after the firmware update is completed.

0 commit comments

Comments
 (0)