Skip to content

Commit 2c8deae

Browse files
committed
fix: DTK shared directory path mismatch due to NFD label sanitization
The DTK flow was failing with path mismatches because the volume mount path is created by network-operator using NFD labels, which sanitize kernel version strings by replacing non-alphanumeric characters (except -._) with underscores. Example kernel: 5.14.0-570.60.1.el9_6.aarch64+64k NFD label: feature.node.kubernetes.io/kernel-version.full=5.14.0-570.60.1.el9_6.aarch64_64k This caused: - Volume mount path: /mnt/shared-doca-driver-toolkit/5.14.0-570.60.1.el9_6.aarch64_64k/ - Container using: /mnt/shared-doca-driver-toolkit/5.14.0-570.60.1.el9_6.aarch64+64k/ The mismatch led to timeout errors where one container waited for dtk_start_compile flag in the sanitized path while the other container created flags in the unsanitized path. Fix: Apply NFD-compatible sanitization to kernel version when constructing DTK_OCP_NIC_SHARED_DIR path in both entrypoint.sh and dtk_nic_driver_build.sh. The FULL_KVER variable remains unchanged for package operations that require the actual kernel version format. This matches the sanitization logic in node-feature-discovery: https://github.com/kubernetes-sigs/node-feature-discovery/blob/master/source/kernel/version.go#L38-L42 Applies to RT kernels (+rt) and 64k hugepage kernels (+64k) on RHEL/RHCOS. Signed-off-by: Fred Rolland <[email protected]>
1 parent 16538d3 commit 2c8deae

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

dtk_nic_driver_build.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,10 @@
44
: ${ENTRYPOINT_DEBUG:=false}
55
: ${DTK_OCP_NIC_SHARED_DIR:=/mnt/shared-nvidia-nic-driver-toolkit}
66

7-
DTK_OCP_NIC_SHARED_DIR=$DTK_OCP_NIC_SHARED_DIR/$(uname -r)
7+
# Sanitize kernel version to match Kubernetes NFD label format used by network-operator for volume paths
8+
# NFD replaces all non-alphanumeric characters (except -._) with underscore, then trims leading/trailing -._
9+
DTK_KVER=$(uname -r | sed 's/[^-A-Za-z0-9_.]/_/g' | sed 's/^[-_.]*//;s/[-_.]*$//')
10+
DTK_OCP_NIC_SHARED_DIR=$DTK_OCP_NIC_SHARED_DIR/$DTK_KVER
811
DTK_OCP_START_COMPILE_FLAG=""
912
DTK_OCP_DONE_COMPILE_FLAG=""
1013
DTK_OCP_COMPILED_DRIVER_VER=""

entrypoint.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1360,7 +1360,10 @@ IS_OS_SLES=true; [[ "$(grep -i sles /etc/os-release -c)" == "0" ]] && IS_OS_SLES
13601360
RHEL_MAJOR_VERSION=0
13611361
OPENSHIFT_VERSION=""
13621362

1363-
DTK_OCP_NIC_SHARED_DIR=${DTK_OCP_NIC_SHARED_DIR}/${FULL_KVER}
1363+
# Sanitize kernel version to match Kubernetes NFD label format used by network-operator for volume paths
1364+
# NFD replaces all non-alphanumeric characters (except -._) with underscore, then trims leading/trailing -._
1365+
DTK_KVER=$(echo "${FULL_KVER}" | sed 's/[^-A-Za-z0-9_.]/_/g' | sed 's/^[-_.]*//;s/[-_.]*$//')
1366+
DTK_OCP_NIC_SHARED_DIR=${DTK_OCP_NIC_SHARED_DIR}/${DTK_KVER}
13641367
DTK_OCP_BUILD_SCRIPT="/root/dtk_nic_driver_build.sh"
13651368
DTK_OCP_START_COMPILE_FLAG=${DTK_OCP_NIC_SHARED_DIR}/dtk_start_compile
13661369
DTK_OCP_DONE_COMPILE_FLAG_PREFIX=${DTK_OCP_NIC_SHARED_DIR}/dtk_done_compile_

0 commit comments

Comments
 (0)