@@ -384,11 +384,59 @@ Once the instance creation will completed you can find it in "Operators > Instal
384384NVIDIA DOCA OFED driver container in disconnected environment
385385-------------------------------------------------------------
386386
387- In case you want to use the NVIDIA DOCA OFED driver container in the disconnected environment, the following steps are required :
387+ In case you want to use the NVIDIA DOCA OFED driver container in the disconnected environment, there are two options :
388388
389- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
389+ - Option 1: Use the NVIDIA DOCA OFED driver container from NGC
390+
391+ - This container builds the DOCA OFED driver from source code dynamically, therefore requires mirroring needed dependencies.
392+
393+ - Option 2: Create a precompiled container for the DOCA OFED driver
394+
395+ - With this option it is not required to mirror dependencies, but it will support only a specific kernel version.
396+
397+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
398+ Option 1: Use the NVIDIA DOCA OFED driver container from NGC
399+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
400+
401+ """"""""""""""""""""""""""""""""""""
402+ Mirroring DOCA OFED Driver Container
403+ """"""""""""""""""""""""""""""""""""
404+
405+ Navigate to the NVIDIA catalog and looking for the right <os-version>-<architecture> suffix tag, such as `doca3.1.0-25.07-0.9.7.0-0-rhel9.6-amd64 `.
406+
407+ The mirrored image must be tagged `<driver-version>-<os-version>-<architecture> `, such as `doca3.1.0-25.07-0.9.7.0-0-rhel9.6-amd64 ` for example.
408+
409+ Note that since OCP 4.19, the os version is now `rhel9.6 ` instead of `rhcos4.x `.
410+
411+
412+ """""""""""""""""""""""""""""""""
390413Create Local Package Repositories
391- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
414+ """""""""""""""""""""""""""""""""
415+
416+ The DOCA-OFED Driver container requires certain packages to be available for the driver installation.
417+ The following packages are required:
418+
419+ .. code-block ::
420+
421+ kernel-headers-${KERNEL_VERSION}
422+ kernel-devel-${KERNEL_VERSION}
423+ kernel-core-${KERNEL_VERSION}
424+ createrepo
425+ elfutils-libelf-devel
426+ kernel-rpm-macros
427+ umactl-libs
428+ lsof
429+ rpm-build
430+ patch
431+ hostname
432+
433+ For RT kernels following packages should be available:
434+
435+ .. code-block ::
436+
437+ kernel-rt-devel-${KERNEL_VERSION}
438+ kernel-rt-modules-${KERNEL_VERSION}
439+
392440
393441 Create the Local Package Repository required:
394442
@@ -406,13 +454,15 @@ redhat.repo:
406454 [baseos]
407455 name=rhel-9-for-x86_64-baseos-rpms
408456 baseurl=http://srv01.air-gapped.local/redhat/9.4/el-9-for-x86_64-baseos-rpms
409- gpgcheck=0
457+ gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
458+ gpgcheck = 1
410459 enabled=1
411460
412461 [apstream]
413462 name=rhel-9-for-x86_64-appstream-rpms
414463 baseurl=http://srv01.air-gapped.local/redhat/rhel-9-for-x86_64-appstream-rpms
415- gpgcheck=0
464+ gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
465+ gpgcheck = 1
416466 enabled=1
417467
418468
@@ -424,12 +474,14 @@ ubi.repo:
424474 name = Red Hat Universal Base Image 9 (RPMs) - BaseOS
425475 baseurl = http://srv01.air-gapped.local/redhat/ubi-9-baseos-rpms
426476 enabled = 1
427- gpgcheck = 0
477+ gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
478+ gpgcheck = 1
428479 [ubi-9-appstream]
429480 name = Red Hat Universal Base Image 9 (RPMs) - AppStream
430481 baseurl = http://srv01.air-gapped.local/redhat/ubi-9-appstream-rpms
431482 enabled = 1
432- gpgcheck = 0
483+ gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
484+ gpgcheck = 1
433485
434486 cuda.repo:
435487
@@ -439,7 +491,8 @@ cuda.repo:
439491 name=cuda
440492 baseurl=http://srv01.air-gapped.local/nvidia/cuda
441493 priority=0
442- gpgcheck=0
494+ gpgcheck=1
495+ gpgkey=http://srv01.air-gapped.local/nvidia/cuda/D42D0685.pub
443496 enabled=1
444497
445498
@@ -456,11 +509,69 @@ If self-signed certificates are used for an HTTPS based local repository, a Conf
456509
457510 oc create configmap cert-config -n nvidia-network-operator --from-file=< path-to-pem-file>
458511
459- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
460- Create a precompiled container for the DOCA OFED driver
461- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
462512
463- We will use the mirror server to build the precompiled container for the DOCA OFED driver.
513+ """"""""""""""""""""""""""""""""""""""
514+ Create the NIC Cluster Policy instance
515+ """"""""""""""""""""""""""""""""""""""
516+
517+ In the web console, click "Operators > Installed Operators", and then "NVIDIA Network Operator > NicClusterPolicy > Create NicClusterPolicy"
518+
519+ You need to provide required parameters depending on your setup. After editing and overriding our nic-cluster-policy yaml looks like this:
520+
521+ .. code-block :: yaml
522+
523+ apiVersion : mellanox.com/v1alpha1
524+ kind : NicClusterPolicy
525+ metadata :
526+ name : nic-cluster-policy
527+ spec :
528+ ofedDriver :
529+ certConfig :
530+ name : cert-config
531+ env :
532+ - name : RESTORE_DRIVER_ON_POD_TERMINATION
533+ value : " true"
534+ - name : UNLOAD_STORAGE_MODULES
535+ value : " true"
536+ - name : CREATE_IFNAMES_UDEV
537+ value : " true"
538+ forcePrecompiled : false
539+ image : doca-driver
540+ imagePullSecrets :
541+ - mirror-registry-ps
542+ livenessProbe :
543+ initialDelaySeconds : 30
544+ periodSeconds : 30
545+ readinessProbe :
546+ initialDelaySeconds : 10
547+ periodSeconds : 30
548+ repoConfig :
549+ name : repo-config
550+ repository : mirror.air-gapped.local:5000/mellanox
551+ startupProbe :
552+ initialDelaySeconds : 10
553+ periodSeconds : 20
554+ terminationGracePeriodSeconds : 300
555+ upgradePolicy :
556+ autoUpgrade : true
557+ drain :
558+ deleteEmptyDir : true
559+ enable : true
560+ force : true
561+ podSelector : " "
562+ timeoutSeconds : 300
563+ maxParallelUpgrades : 1
564+ safeLoad : false
565+ waitForCompletion :
566+ timeoutSeconds : 0
567+ version : doca3.1.0-25.07-0.9.7.0-0
568+
569+ Note: Please be sure to provide configured ConfigMaps: `repo-config ` and `cert-config `.
570+
571+
572+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
573+ Option 2: Create a precompiled container for the DOCA OFED driver
574+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
464575
465576Please verify that you have the following:
466577
@@ -479,9 +590,9 @@ In order to get the sha256 of the image, you can use the following command:
479590
480591 skopeo inspect docker://mirror.air-gapped.local:5000/mellanox/doca-driver:your-tag | jq -r ' .Digest'
481592
482- --------------------------------------
593+ """"""""""""""""""""""""""""""""""""""
483594Create the NIC Cluster Policy instance
484- --------------------------------------
595+ """"""""""""""""""""""""""""""""""""""
485596
486597In the web console, click "Operators > Installed Operators", and then "NVIDIA Network Operator > NicClusterPolicy > Create NicClusterPolicy"
487598
@@ -492,50 +603,49 @@ You need to provide required parameters depending on your setup. After editing a
492603 apiVersion : mellanox.com/v1alpha1
493604 kind : NicClusterPolicy
494605 metadata :
495- name: nic-cluster-policy
606+ name : nic-cluster-policy
496607 spec :
497- ofedDriver:
498- certConfig :
499- name : cert-config
500- env :
501- - name : RESTORE_DRIVER_ON_POD_TERMINATION
502- value : " true"
503- - name : UNLOAD_STORAGE_MODULES
504- value : " true"
505- - name : CREATE_IFNAMES_UDEV
506- value : " true"
507- forcePrecompiled : true
508- image : doca-driver
509- imagePullSecrets :
510- - mirror-registry-ps
511- livenessProbe :
512- initialDelaySeconds : 30
513- periodSeconds : 30
514- readinessProbe :
515- initialDelaySeconds : 10
516- periodSeconds : 30
517- repoConfig :
518- name : repo-config
519- repository : mirror.air-gapped.local:5000/mellanox
520- startupProbe :
521- initialDelaySeconds : 10
522- periodSeconds : 20
523- terminationGracePeriodSeconds : 300
524- upgradePolicy :
525- autoUpgrade : true
526- drain :
527- deleteEmptyDir : true
528- enable : true
529- force : true
530- podSelector : " "
531- timeoutSeconds : 300
532- maxParallelUpgrades : 1
533- safeLoad : false
534- waitForCompletion :
535- timeoutSeconds : 0
536- version : sha256:9a831bfdf85f313b1f5749b7c9b2673bb8fff18b4ff768c9242dabaa4468e449
537-
538- Note: Please be sure to provide configured ConfigMaps: `repo-config ` and `cert-config `.
608+ ofedDriver :
609+ env :
610+ - name : RESTORE_DRIVER_ON_POD_TERMINATION
611+ value : " true"
612+ - name : UNLOAD_STORAGE_MODULES
613+ value : " true"
614+ - name : CREATE_IFNAMES_UDEV
615+ value : " true"
616+ forcePrecompiled : true
617+ image : doca-driver
618+ imagePullSecrets :
619+ - mirror-registry-ps
620+ livenessProbe :
621+ initialDelaySeconds : 30
622+ periodSeconds : 30
623+ readinessProbe :
624+ initialDelaySeconds : 10
625+ periodSeconds : 30
626+ repository : mirror.air-gapped.local:5000/mellanox
627+ startupProbe :
628+ initialDelaySeconds : 10
629+ periodSeconds : 20
630+ terminationGracePeriodSeconds : 300
631+ upgradePolicy :
632+ autoUpgrade : true
633+ drain :
634+ deleteEmptyDir : true
635+ enable : true
636+ force : true
637+ podSelector : " "
638+ timeoutSeconds : 300
639+ maxParallelUpgrades : 1
640+ safeLoad : false
641+ waitForCompletion :
642+ timeoutSeconds : 0
643+ version : sha256:9a831bfdf85f313b1f5749b7c9b2673bb8fff18b4ff768c9242dabaa4468e449
644+
645+
646+ ------------------
647+ Image Pull Secrets
648+ ------------------
539649
540650If your local repository requires username and password for access you need to create imagePullSecrets and provide this parameter in nic-cluster-policy.yaml:
541651
0 commit comments