Skip to content

Commit bfe3273

Browse files
authored
update docs for 25.3.1 release (#179)
* update docs for 25.3.1 release Signed-off-by: Abigail McCarthy <[email protected]> * fix platform support page Signed-off-by: Abigail McCarthy <[email protected]> --------- Signed-off-by: Abigail McCarthy <[email protected]>
1 parent d915e3c commit bfe3273

File tree

6 files changed

+103
-39
lines changed

6 files changed

+103
-39
lines changed

gpu-operator/getting-started.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,11 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
160160
* - ``daemonsets.labels``
161161
- Map of custom labels to add to all GPU Operator managed pods.
162162
- ``{}``
163+
164+
* - ``dcgmExporter.service.internalTrafficPolicy``
165+
- Specifies the `internalTrafficPolicy <https://kubernetes.io/docs/concepts/services-networking/service/#internal-traffic-policy>`_ for the DCGM Exporter service.
166+
Available values are ``Cluster`` (default) or ``Local``.
167+
- ``Cluster``
163168

164169
* - ``devicePlugin.config``
165170
- Specifies the configuration for the NVIDIA Device Plugin as a config map.

gpu-operator/life-cycle-policy.rst

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -89,37 +89,34 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
8989
- ${version}
9090

9191
* - NVIDIA GPU Driver
92-
- | `570.148.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-148-08/index.html>`_ (recommended)
93-
| `570.133.20 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-133-20/index.html>`_
94-
| `570.124.06 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-124-06/index.html>`_ (default)
95-
| `570.86.15 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-86-15/index.html>`_
92+
- | `570.148.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-148-08/index.html>`_ (default, recommended)
9693
| `550.163.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-163-01/index.html>`_
9794
| `535.247.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-247-01/index.html>`_
9895
9996
* - NVIDIA Driver Manager for Kubernetes
10097
- `v0.8.0 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`__
10198

10299
* - NVIDIA Container Toolkit
103-
- `1.17.5 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`__
100+
- `1.17.8 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`__
104101

105102
* - NVIDIA Kubernetes Device Plugin
106-
- `0.17.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
103+
- `0.17.2 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
107104

108105
* - DCGM Exporter
109-
- `4.1.1-4.0.4 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
106+
- `4.2.3-4.1.3 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
110107

111108
* - Node Feature Discovery
112-
- `v0.17.2 <https://github.com/kubernetes-sigs/node-feature-discovery/releases/>`__
109+
- `v0.17.3 <https://github.com/kubernetes-sigs/node-feature-discovery/releases/>`__
113110

114111
* - | NVIDIA GPU Feature Discovery
115112
| for Kubernetes
116-
- `0.17.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
113+
- `0.17.2 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
117114

118115
* - NVIDIA MIG Manager for Kubernetes
119116
- `0.12.1 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`__
120117

121118
* - DCGM
122-
- `4.1.1-2 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
119+
- `4.2.3 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
123120

124121
* - Validator for NVIDIA GPU Operator
125122
- ${version}
@@ -141,7 +138,7 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
141138
- v0.1.1
142139

143140
* - NVIDIA GDRCopy Driver
144-
- `v2.4.4 <https://github.com/NVIDIA/gdrcopy/releases>`__
141+
- `v2.5.0 <https://github.com/NVIDIA/gdrcopy/releases>`__
145142

146143
.. _gds-open-kernel:
147144

gpu-operator/platform-support.rst

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -72,15 +72,18 @@ The following NVIDIA data center GPUs are supported on x86 based platforms:
7272
| | NVIDIA H200, | NVIDIA Hopper |
7373
| | NVIDIA H200 NVL | |
7474
+-------------------------+---------------------------+
75-
| NVIDIA HGX H200 | NVIDIA Hopper and |
75+
| NVIDIA DGX H100 | NVIDIA Hopper and |
7676
| | NVSwitch |
7777
+-------------------------+---------------------------+
78-
| NVIDIA DGX H100 | NVIDIA Hopper and |
78+
| NVIDIA DGX H200 | NVIDIA Hopper and |
7979
| | NVSwitch |
8080
+-------------------------+---------------------------+
8181
| NVIDIA HGX H100 | NVIDIA Hopper and |
8282
| | NVSwitch |
8383
+-------------------------+---------------------------+
84+
| NVIDIA HGX H200 | NVIDIA Hopper and |
85+
| | NVSwitch |
86+
+-------------------------+---------------------------+
8487
| | NVIDIA H100, | NVIDIA Hopper |
8588
| | NVIDIA H100 NVL | |
8689
+-------------------------+---------------------------+
@@ -170,15 +173,18 @@ The following NVIDIA data center GPUs are supported on x86 based platforms:
170173
+-------------------------+------------------------+
171174
| Product | Architecture |
172175
+=========================+========================+
176+
| NVIDIA DGX B200 | NVIDIA Blackwell |
177+
+-------------------------+------------------------+
173178
| NVIDIA HGX B200 | NVIDIA Blackwell |
174179
+-------------------------+------------------------+
175-
| NVIDIA HGX GB200 NVL | NVIDIA Blackwell |
180+
| NVIDIA HGX GB200 NVL72 | NVIDIA Blackwell |
176181
+-------------------------+------------------------+
177182

178183
.. note::
179184

180185
* HGX B200 requires a driver container version of 570.133.20 or later.
181186

187+
182188
.. _gpu-operator-arm-platforms:
183189

184190
Supported ARM Based Platforms
@@ -242,6 +248,8 @@ Supported Operating Systems and Kubernetes Platforms
242248
.. |fn1| replace:: :sup:`1`
243249
.. _fn2: #ubuntu-kernel
244250
.. |fn2| replace:: :sup:`2`
251+
.. _fn3: #rhel-9
252+
.. |fn3| replace:: :sup:`3`
245253

246254
The GPU Operator has been validated in the following scenarios:
247255

@@ -271,25 +279,25 @@ The GPU Operator has been validated in the following scenarios:
271279
| NKP
272280
273281
* - Ubuntu 20.04 LTS |fn2|_
274-
- 1.29---1.32
282+
- 1.29---1.33
275283
-
276284
- 7.0 U3c, 8.0 U2, 8.0 U3
277-
- 1.29---1.32
285+
- 1.29---1.33
278286
-
279287
-
280288
- 2.12, 2.13
281289

282290
* - Ubuntu 22.04 LTS |fn2|_
283-
- 1.29---1.32
291+
- 1.29---1.33
284292
-
285293
- 8.0 U2, 8.0 U3
286-
- 1.29---1.32
294+
- 1.29---1.33
287295
-
288296
- 1.26
289297
- 2.12, 2.13
290298

291299
* - Ubuntu 24.04 LTS
292-
- 1.29---1.32
300+
- 1.29---1.33
293301
-
294302
-
295303
-
@@ -308,27 +316,27 @@ The GPU Operator has been validated in the following scenarios:
308316

309317
* - | Red Hat
310318
| Enterprise
311-
| Linux 8.8,
312-
| 8.10
313-
- 1.29---1.32
319+
| Linux 9.2, 9.4, 9.5, 9.6 |fn3|_
320+
- 1.29---1.33
314321
-
315322
-
316-
- 1.29---1.32
323+
- 1.29---1.33
317324
-
318325
-
319326
-
320327

321328
* - | Red Hat
322329
| Enterprise
323-
| Linux 8.4, 8.5
324-
-
330+
| Linux 8.8,
331+
| 8.10
332+
- 1.29---1.33
325333
-
326334
-
335+
- 1.29---1.33
327336
-
328-
- 5.5
329337
-
330338
-
331-
339+
332340
.. _kubernetes-version:
333341

334342
:sup:`1`
@@ -345,7 +353,12 @@ The GPU Operator has been validated in the following scenarios:
345353
`Ubuntu kernel lifecycle and enablement stack <https://ubuntu.com/kernel/lifecycle>`_ page for more information.
346354
NVIDIA recommends disabling automatic updates for the Linux kernel that are performed
347355
by the ``unattended-upgrades`` package to prevent an upgrade to an unsupported kernel version.
348-
356+
357+
.. _rhel-9:
358+
359+
:sup:`3`
360+
Non-precompiled driver containers for Red Hat Enterprise Linux 9.2, 9.4, 9.5, and 9.6 versions are available for x86 based platforms only.
361+
They are not available for ARM based systems.
349362

350363
.. note::
351364

@@ -395,21 +408,21 @@ The GPU Operator has been validated in the following scenarios:
395408
| NKP
396409
397410
* - Ubuntu 20.04 LTS
398-
- 1.29--1.32
411+
- 1.29--1.33
399412
-
400413
- 7.0 U3c, 8.0 U2, 8.0 U3
401414
- 1.23---1.25
402415
- 2.12, 2.13
403416

404417
* - Ubuntu 22.04 LTS
405-
- 1.29--1.32
418+
- 1.29--1.33
406419
-
407420
- 8.0 U2, 8.0 U3
408421
-
409422
- 2.12, 2.13
410423

411424
* - Ubuntu 24.04 LTS
412-
- 1.29--1.32
425+
- 1.29--1.33
413426
-
414427
-
415428
-
@@ -426,10 +439,10 @@ The GPU Operator has been validated in the following scenarios:
426439
| Enterprise
427440
| Linux 8.4,
428441
| 8.6---8.10
429-
- 1.29---1.32
442+
- 1.29---1.33
430443
-
431444
-
432-
- 1.29---1.32
445+
- 1.29---1.33
433446
-
434447

435448

@@ -469,6 +482,8 @@ The GPU Operator has been validated in the following scenarios:
469482
+----------------------------+------------------------+----------------+
470483
| Red Hat Enterprise Linux 8 | Yes | Yes |
471484
+----------------------------+------------------------+----------------+
485+
| Red Hat Enterprise Linux 9 | Yes | Yes |
486+
+----------------------------+------------------------+----------------+
472487

473488

474489
Support for KubeVirt and OpenShift Virtualization
@@ -521,6 +536,7 @@ Supported operating systems and NVIDIA GPU Drivers with GPUDirect RDMA.
521536

522537
- Ubuntu 24.04 LTS with Network Operator 25.1.0.
523538
- Ubuntu 20.04 and 22.04 LTS with Network Operator 24.10.0.
539+
- Red Hat Enterprise Linux 9.2, 9.4, 9.5, and 9.6 with Network Operator 25.1.0.
524540
- Red Hat OpenShift 4.12 and higher with Network Operator 23.10.0
525541

526542
For information about configuring GPUDirect RDMA, refer to :doc:`gpu-operator-rdma`.

gpu-operator/release-notes.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,51 @@ See the :ref:`GPU Operator Component Matrix` for a list of software components a
3333

3434
----
3535

36+
.. _v25.3.1:
37+
38+
25.3.1
39+
======
40+
41+
.. _v25.3.1-new-features:
42+
43+
New Features
44+
------------
45+
46+
* Added support for the following software component versions:
47+
48+
- NVIDIA Container Toolkit version v1.17.8
49+
- NVIDIA DCGM v4.2.3
50+
- NVIDIA DCGM Exporter v4.2.3-4.1.3
51+
- NVIDIA Kubernetes Device Plugin v0.17.2
52+
- Node Feature Discovery v0.17.3
53+
- NVIDIA GDRCopy Driver v2.5.0
54+
55+
* Added support for the following NVIDIA Data Center GPU Driver versions:
56+
57+
- 570.148.08 (default, recommended)
58+
- 570.133.20
59+
- 550.163.01
60+
- 535.247.01
61+
62+
* Added support for Red Hat Enterprise Linux 9.
63+
Non-precompiled driver containers for Red Hat Enterprise Linux 9.2, 9.4, 9.5, and 9.6 versions are available for x86 based platforms only.
64+
They are not available for ARM based systems.
65+
66+
* Added support for Kubernetes v1.33.
67+
68+
* Added support for setting the internalTrafficPolicy for the DCGM Exporter service.
69+
You can configure this in the Helm chart value by setting `dcgmexporter.service.internalTrafficPolicy` to `Local` or `Cluster` (default).
70+
Choose Local if you want to route internal traffic within the node only.
71+
72+
.. _v25.3.1-fixed-issues:
73+
74+
Fixed Issues
75+
------------
76+
77+
* Fixed an issue where the NVIDIADriver controller may enter an endless loop of creating and deleting a DaemonSet.
78+
This could occur when the NVIDIADriver DaemonSet does not tolerate a taint present on all nodes matching its configured nodeSelector, or when none of the DaemonSet pods have been scheduled yet.
79+
Refer to Github `pull request #1416 <https://github.com/NVIDIA/gpu-operator/pull/1416>`__ for more details.
80+
3681
.. _v25.3.0:
3782

3883
25.3.0
@@ -2263,3 +2308,4 @@ Known Limitations
22632308

22642309
* After un-install of GPU Operator, NVIDIA driver modules might still be loaded. Either reboot the node or forcefully remove them using
22652310
``sudo rmmod nvidia nvidia_modeset nvidia_uvm`` command before re-installing GPU Operator again.
2311+

gpu-operator/versions1.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
[
22
{
33
"preferred": "true",
4+
"url": "../25.3.1",
5+
"version": "25.3.1"
6+
},
7+
{
48
"url": "../25.3.0",
59
"version": "25.3.0"
610
},
@@ -19,9 +23,5 @@
1923
{
2024
"url": "../24.6.2",
2125
"version": "24.6.2"
22-
},
23-
{
24-
"url": "../24.6.1",
25-
"version": "24.6.1"
2626
}
2727
]

repo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,8 +166,8 @@ output_format = "linkcheck"
166166
docs_root = "${root}/gpu-operator"
167167
project = "gpu-operator"
168168
name = "NVIDIA GPU Operator"
169-
version = "25.3.0"
170-
source_substitutions = { version = "v25.3.0", recommended = "570.124.06" }
169+
version = "25.3.1"
170+
source_substitutions = { version = "v25.3.1", recommended = "570.148.08" }
171171
copyright_start = 2020
172172
sphinx_exclude_patterns = [
173173
"life-cycle-policy.rst",

0 commit comments

Comments
 (0)