Skip to content

Commit 7afaf4e

Browse files
authored
Openshift updates and other minor fixes (#169)
* update openshift docs, pdate mirantis, add selinux notes, update release notes Signed-off-by: Abigail McCarthy <[email protected]> * remove cuda_version from vgpu manager image build steps Signed-off-by: Abigail McCarthy <[email protected]> --------- Signed-off-by: Abigail McCarthy <[email protected]>
1 parent 04c2f6e commit 7afaf4e

File tree

4 files changed

+41
-39
lines changed

4 files changed

+41
-39
lines changed

gpu-operator/getting-started.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,8 @@ with the NVIDIA GPU Operator.
347347
Refer to the :ref:`GPU Operator Component Matrix` on the platform support page.
348348

349349
When using RHEL8 with Kubernetes, SELinux must be enabled either in permissive or enforcing mode for use with the GPU Operator.
350-
Additionally, network restricted environments are not supported.
350+
Additionally, when using RHEL8 with containerd as the runtime and SELinux is enabled (either in permissive or enforcing mode) at the host level, containerd must also be configured for SELinux, by setting the ``enable_selinux=true`` configuration option.
351+
Note, network restricted environments are not supported.
351352

352353

353354
Pre-Installed NVIDIA GPU Drivers

gpu-operator/release-notes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ New Features
8686

8787
* Added support for the NVIDIA Data Center GPU Driver version 570.124.06.
8888

89-
* Added support for KubeVirt and OpenShift Virtualization with vGPU v18 for A30, A100, and H100 GPUs.
89+
* Added support for KubeVirt and OpenShift Virtualization with vGPU v18 on H200NVL.
9090

9191
* Added support for NVIDIA Network Operator v25.1.0.
9292
Refer to :ref:`Support for GPUDirect RDMA` and :ref:`Support for GPUDirect Storage`.

openshift/openshift-virtualization.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,8 @@ Procedure
129129
version: 3.2.0
130130
kernelArguments:
131131
- intel_iommu=on
132+
# If you are using AMD CPU, include the following argument:
133+
# - amd_iommu=on
132134
133135
#. Create the new ``MachineConfig`` object:
134136

@@ -196,7 +198,7 @@ Use the following steps to build the vGPU Manager container and push it to a pri
196198

197199
.. code-block:: console
198200
199-
$ cd vgpu-manager/rhel
201+
$ cd vgpu-manager/rhel8
200202
201203
#. Copy the NVIDIA vGPU Manager from your extracted zip file:
202204

@@ -210,24 +212,22 @@ Use the following steps to build the vGPU Manager container and push it to a pri
210212
* ``VERSION`` - The NVIDIA vGPU Manager version downloaded from the NVIDIA Software Portal.
211213
* ``OS_TAG`` - This must match the Guest OS version.
212214
For RedHat OpenShift, specify ``rhcos4.x`` where _x_ is the supported minor OCP version.
213-
* ``CUDA_VERSION`` - CUDA base image version to build the driver image with.
214215

215216
.. code-block:: console
216217
217-
$ export PRIVATE_REGISTRY=my/private/registry VERSION=510.73.06 OS_TAG=rhcos4.11 CUDA_VERSION=11.7.1
218+
$ export PRIVATE_REGISTRY=my/private/registry VERSION=510.73.06 OS_TAG=rhcos4.11
218219
219-
.. note::
220+
.. note::
220221

221-
The recommended registry to use is the Integrated OpenShift Container Platform registry.
222-
For more information about the registry, see `Accessing the registry <https://docs.openshift.com/container-platform/latest/registry/accessing-the-registry.html>`_.
222+
The recommended registry to use is the Integrated OpenShift Container Platform registry.
223+
For more information about the registry, see `Accessing the registry <https://docs.openshift.com/container-platform/latest/registry/accessing-the-registry.html>`_.
223224

224225
#. Build the NVIDIA vGPU Manager image:
225226

226227
.. code-block:: console
227228
228229
$ docker build \
229230
--build-arg DRIVER_VERSION=${VERSION} \
230-
--build-arg CUDA_VERSION=${CUDA_VERSION} \
231231
-t ${PRIVATE_REGISTRY}/vgpu-manager:${VERSION}-${OS_TAG} .
232232
233233
#. Push the NVIDIA vGPU Manager image to your private registry:
@@ -242,7 +242,7 @@ Installing the NVIDIA GPU Operator using the CLI
242242

243243
Install the NVIDIA GPU Operator using the guidance at :ref:`Installing the NVIDIA GPU Operator <install-nvidiagpu>`.
244244

245-
.. note:: When prompted to create a cluster policy follow the guidance :ref:`Creating a ClusterPolicy for the GPU Operator<install-cluster-policy-vGPU>`.
245+
.. note:: When prompted to create a cluster policy follow the guidance :ref:`Creating a ClusterPolicy for the GPU Operator<install-cluster-policy-vGPU>`.
246246

247247
Create the secret
248248
=================

partner-validated/mirantis-mke.rst

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,36 @@ Validated Configuration Matrix
4444
- NVIDIA GPU
4545
- Hardware Model
4646

47+
* - k0s v1.31.5+k0s / k0rdent 0.1.0
48+
- v24.9.2
49+
- | Ubuntu 22.04
50+
- containerd v1.7.24 with the NVIDIA Container Toolkit v1.17.4
51+
- 1.31.5
52+
- Helm v3
53+
- | 2x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC)
54+
- | Supermicro SuperServer 6028U-E1CNR4T+
55+
56+
| 1000W Supermicro PWS-1K02A-1R
57+
58+
| 2x Intel Xeon E5-2630v4, 10C/20T 2.2/3.1 GHz LGA 2011-3 25MB 85W
59+
60+
| 32GB DDR4-2666 RDIMM, M393A4K40BB2-CTD6Q
61+
62+
| NVMe 960GB PM983 NVMe M.2, MZ1LB960HAJQ-00007
63+
64+
| 2 x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC), 70W, PCIe 4.0x16, 4x
65+
66+
| 4x Mini DisplayPort 1.4a
67+
68+
* - MKE 3.8
69+
- v24.9.2
70+
- | Ubuntu 22.04
71+
- Mirantis Container Runtime (MCR) 25.0.1
72+
- 1.31.5
73+
- Helm v3
74+
- | NVIDIA T4 Tensor Core
75+
- | AWS EC2 g4dn.2xlarge (8vcpus/32GB)
76+
4777
* - MKE 3.6.2+ and 3.5.7+
4878
- v23.3.1
4979
- | RHEL 8.7
@@ -71,35 +101,6 @@ Validated Configuration Matrix
71101
| 1x RAID Controller PERC H710
72102
73103
| 1x Network card FM487
74-
* - MKE 3.8
75-
- v24.9.2
76-
- | Ubuntu 22.04
77-
- Mirantis Container Runtime (MCR) 25.0.1
78-
- 1.31.5
79-
- Helm v3
80-
- | NVIDIA T4 Tensor Core
81-
- | AWS EC2 g4dn.2xlarge (8vcpus/32GB)
82-
* - k0s v1.31.5+k0s / k0rdent 0.1.0
83-
- v24.9.2
84-
- | Ubuntu 22.04
85-
- containerd v1.7.24 with the NVIDIA Container Toolkit v1.17.4
86-
- 1.31.5
87-
- Helm v3
88-
- | 2x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC)
89-
- | Supermicro SuperServer 6028U-E1CNR4T+
90-
91-
| 1000W Supermicro PWS-1K02A-1R
92-
93-
| 2x Intel Xeon E5-2630v4, 10C/20T 2.2/3.1 GHz LGA 2011-3 25MB 85W
94-
95-
| 32GB DDR4-2666 RDIMM, M393A4K40BB2-CTD6Q
96-
97-
| NVMe 960GB PM983 NVMe M.2, MZ1LB960HAJQ-00007
98-
99-
| 2 x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC), 70W, PCIe 4.0x16, 4x
100-
101-
| 4x Mini DisplayPort 1.4a
102-
103104
104105
*************
105106
Prerequisites

0 commit comments

Comments
 (0)