Skip to content

Commit bf8493a

Browse files
authored
Merge pull request #279 from rollandf/ocp-discon-ctnd
fix: OCP disconnected DOCA OFED container
2 parents d5fbf96 + 91902e9 commit bf8493a

File tree

1 file changed

+167
-57
lines changed

1 file changed

+167
-57
lines changed

docs/openshift/disconnected-openshift.rst

Lines changed: 167 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -384,11 +384,59 @@ Once the instance creation will completed you can find it in "Operators > Instal
384384
NVIDIA DOCA OFED driver container in disconnected environment
385385
-------------------------------------------------------------
386386

387-
In case you want to use the NVIDIA DOCA OFED driver container in the disconnected environment, the following steps are required:
387+
In case you want to use the NVIDIA DOCA OFED driver container in the disconnected environment, there are two options:
388388

389-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
389+
- Option 1: Use the NVIDIA DOCA OFED driver container from NGC
390+
391+
- This container builds the DOCA OFED driver from source code dynamically, therefore requires mirroring needed dependencies.
392+
393+
- Option 2: Create a precompiled container for the DOCA OFED driver
394+
395+
- With this option it is not required to mirror dependencies, but it will support only a specific kernel version.
396+
397+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
398+
Option 1: Use the NVIDIA DOCA OFED driver container from NGC
399+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
400+
401+
""""""""""""""""""""""""""""""""""""
402+
Mirroring DOCA OFED Driver Container
403+
""""""""""""""""""""""""""""""""""""
404+
405+
Navigate to the NVIDIA catalog and looking for the right <os-version>-<architecture> suffix tag, such as `doca3.1.0-25.07-0.9.7.0-0-rhel9.6-amd64`.
406+
407+
The mirrored image must be tagged `<driver-version>-<os-version>-<architecture>`, such as `doca3.1.0-25.07-0.9.7.0-0-rhel9.6-amd64` for example.
408+
409+
Note that since OCP 4.19, the os version is now `rhel9.6` instead of `rhcos4.x`.
410+
411+
412+
"""""""""""""""""""""""""""""""""
390413
Create Local Package Repositories
391-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
414+
"""""""""""""""""""""""""""""""""
415+
416+
The DOCA-OFED Driver container requires certain packages to be available for the driver installation.
417+
The following packages are required:
418+
419+
.. code-block::
420+
421+
kernel-headers-${KERNEL_VERSION}
422+
kernel-devel-${KERNEL_VERSION}
423+
kernel-core-${KERNEL_VERSION}
424+
createrepo
425+
elfutils-libelf-devel
426+
kernel-rpm-macros
427+
umactl-libs
428+
lsof
429+
rpm-build
430+
patch
431+
hostname
432+
433+
For RT kernels following packages should be available:
434+
435+
.. code-block::
436+
437+
kernel-rt-devel-${KERNEL_VERSION}
438+
kernel-rt-modules-${KERNEL_VERSION}
439+
392440
393441
Create the Local Package Repository required:
394442

@@ -406,13 +454,15 @@ redhat.repo:
406454
[baseos]
407455
name=rhel-9-for-x86_64-baseos-rpms
408456
baseurl=http://srv01.air-gapped.local/redhat/9.4/el-9-for-x86_64-baseos-rpms
409-
gpgcheck=0
457+
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
458+
gpgcheck = 1
410459
enabled=1
411460
412461
[apstream]
413462
name=rhel-9-for-x86_64-appstream-rpms
414463
baseurl=http://srv01.air-gapped.local/redhat/rhel-9-for-x86_64-appstream-rpms
415-
gpgcheck=0
464+
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
465+
gpgcheck = 1
416466
enabled=1
417467
418468
@@ -424,12 +474,14 @@ ubi.repo:
424474
name = Red Hat Universal Base Image 9 (RPMs) - BaseOS
425475
baseurl = http://srv01.air-gapped.local/redhat/ubi-9-baseos-rpms
426476
enabled = 1
427-
gpgcheck = 0
477+
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
478+
gpgcheck = 1
428479
[ubi-9-appstream]
429480
name = Red Hat Universal Base Image 9 (RPMs) - AppStream
430481
baseurl = http://srv01.air-gapped.local/redhat/ubi-9-appstream-rpms
431482
enabled = 1
432-
gpgcheck = 0
483+
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
484+
gpgcheck = 1
433485
434486
cuda.repo:
435487

@@ -439,7 +491,8 @@ cuda.repo:
439491
name=cuda
440492
baseurl=http://srv01.air-gapped.local/nvidia/cuda
441493
priority=0
442-
gpgcheck=0
494+
gpgcheck=1
495+
gpgkey=http://srv01.air-gapped.local/nvidia/cuda/D42D0685.pub
443496
enabled=1
444497
445498
@@ -456,11 +509,69 @@ If self-signed certificates are used for an HTTPS based local repository, a Conf
456509
457510
oc create configmap cert-config -n nvidia-network-operator --from-file=<path-to-pem-file>
458511
459-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
460-
Create a precompiled container for the DOCA OFED driver
461-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
462512
463-
We will use the mirror server to build the precompiled container for the DOCA OFED driver.
513+
""""""""""""""""""""""""""""""""""""""
514+
Create the NIC Cluster Policy instance
515+
""""""""""""""""""""""""""""""""""""""
516+
517+
In the web console, click "Operators > Installed Operators", and then "NVIDIA Network Operator > NicClusterPolicy > Create NicClusterPolicy"
518+
519+
You need to provide required parameters depending on your setup. After editing and overriding our nic-cluster-policy yaml looks like this:
520+
521+
.. code-block:: yaml
522+
523+
apiVersion: mellanox.com/v1alpha1
524+
kind: NicClusterPolicy
525+
metadata:
526+
name: nic-cluster-policy
527+
spec:
528+
ofedDriver:
529+
certConfig:
530+
name: cert-config
531+
env:
532+
- name: RESTORE_DRIVER_ON_POD_TERMINATION
533+
value: "true"
534+
- name: UNLOAD_STORAGE_MODULES
535+
value: "true"
536+
- name: CREATE_IFNAMES_UDEV
537+
value: "true"
538+
forcePrecompiled: false
539+
image: doca-driver
540+
imagePullSecrets:
541+
- mirror-registry-ps
542+
livenessProbe:
543+
initialDelaySeconds: 30
544+
periodSeconds: 30
545+
readinessProbe:
546+
initialDelaySeconds: 10
547+
periodSeconds: 30
548+
repoConfig:
549+
name: repo-config
550+
repository: mirror.air-gapped.local:5000/mellanox
551+
startupProbe:
552+
initialDelaySeconds: 10
553+
periodSeconds: 20
554+
terminationGracePeriodSeconds: 300
555+
upgradePolicy:
556+
autoUpgrade: true
557+
drain:
558+
deleteEmptyDir: true
559+
enable: true
560+
force: true
561+
podSelector: ""
562+
timeoutSeconds: 300
563+
maxParallelUpgrades: 1
564+
safeLoad: false
565+
waitForCompletion:
566+
timeoutSeconds: 0
567+
version: doca3.1.0-25.07-0.9.7.0-0
568+
569+
Note: Please be sure to provide configured ConfigMaps: `repo-config` and `cert-config`.
570+
571+
572+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
573+
Option 2: Create a precompiled container for the DOCA OFED driver
574+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
464575

465576
Please verify that you have the following:
466577

@@ -479,9 +590,9 @@ In order to get the sha256 of the image, you can use the following command:
479590
480591
skopeo inspect docker://mirror.air-gapped.local:5000/mellanox/doca-driver:your-tag | jq -r '.Digest'
481592
482-
--------------------------------------
593+
""""""""""""""""""""""""""""""""""""""
483594
Create the NIC Cluster Policy instance
484-
--------------------------------------
595+
""""""""""""""""""""""""""""""""""""""
485596

486597
In the web console, click "Operators > Installed Operators", and then "NVIDIA Network Operator > NicClusterPolicy > Create NicClusterPolicy"
487598

@@ -492,50 +603,49 @@ You need to provide required parameters depending on your setup. After editing a
492603
apiVersion: mellanox.com/v1alpha1
493604
kind: NicClusterPolicy
494605
metadata:
495-
name: nic-cluster-policy
606+
name: nic-cluster-policy
496607
spec:
497-
ofedDriver:
498-
certConfig:
499-
name: cert-config
500-
env:
501-
- name: RESTORE_DRIVER_ON_POD_TERMINATION
502-
value: "true"
503-
- name: UNLOAD_STORAGE_MODULES
504-
value: "true"
505-
- name: CREATE_IFNAMES_UDEV
506-
value: "true"
507-
forcePrecompiled: true
508-
image: doca-driver
509-
imagePullSecrets:
510-
- mirror-registry-ps
511-
livenessProbe:
512-
initialDelaySeconds: 30
513-
periodSeconds: 30
514-
readinessProbe:
515-
initialDelaySeconds: 10
516-
periodSeconds: 30
517-
repoConfig:
518-
name: repo-config
519-
repository: mirror.air-gapped.local:5000/mellanox
520-
startupProbe:
521-
initialDelaySeconds: 10
522-
periodSeconds: 20
523-
terminationGracePeriodSeconds: 300
524-
upgradePolicy:
525-
autoUpgrade: true
526-
drain:
527-
deleteEmptyDir: true
528-
enable: true
529-
force: true
530-
podSelector: ""
531-
timeoutSeconds: 300
532-
maxParallelUpgrades: 1
533-
safeLoad: false
534-
waitForCompletion:
535-
timeoutSeconds: 0
536-
version: sha256:9a831bfdf85f313b1f5749b7c9b2673bb8fff18b4ff768c9242dabaa4468e449
537-
538-
Note: Please be sure to provide configured ConfigMaps: `repo-config` and `cert-config`.
608+
ofedDriver:
609+
env:
610+
- name: RESTORE_DRIVER_ON_POD_TERMINATION
611+
value: "true"
612+
- name: UNLOAD_STORAGE_MODULES
613+
value: "true"
614+
- name: CREATE_IFNAMES_UDEV
615+
value: "true"
616+
forcePrecompiled: true
617+
image: doca-driver
618+
imagePullSecrets:
619+
- mirror-registry-ps
620+
livenessProbe:
621+
initialDelaySeconds: 30
622+
periodSeconds: 30
623+
readinessProbe:
624+
initialDelaySeconds: 10
625+
periodSeconds: 30
626+
repository: mirror.air-gapped.local:5000/mellanox
627+
startupProbe:
628+
initialDelaySeconds: 10
629+
periodSeconds: 20
630+
terminationGracePeriodSeconds: 300
631+
upgradePolicy:
632+
autoUpgrade: true
633+
drain:
634+
deleteEmptyDir: true
635+
enable: true
636+
force: true
637+
podSelector: ""
638+
timeoutSeconds: 300
639+
maxParallelUpgrades: 1
640+
safeLoad: false
641+
waitForCompletion:
642+
timeoutSeconds: 0
643+
version: sha256:9a831bfdf85f313b1f5749b7c9b2673bb8fff18b4ff768c9242dabaa4468e449
644+
645+
646+
------------------
647+
Image Pull Secrets
648+
------------------
539649

540650
If your local repository requires username and password for access you need to create imagePullSecrets and provide this parameter in nic-cluster-policy.yaml:
541651

0 commit comments

Comments
 (0)