Commit 9ae4cc3
committed
Squashed commit of the following:
commit 55fc7b0
Merge: 5443e0f ef23484
Author: Shiva Krishna Merla <[email protected]>
Date: Thu Nov 6 16:04:43 2025 -0800
Merge pull request NVIDIA#668 from varunrsekar/vfio-support-1.33
Support VFIO passthrough
commit ef23484
Author: Varun Ramachandra Sekar <[email protected]>
Date: Tue Oct 14 17:29:18 2025 -0700
vfio passthrough support
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
use chroot to run modprobe
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
deadvertise sibling devices on preparation
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
soft check for VFs before attempting unbind
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
address review comments
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
address comments (2)
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
use fuser to check if gpu is free
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
remove unnecessary securityContext
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
don't mix vfio and mig devices
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
commit 5443e0f
Merge: 59d775b 3babfe5
Author: Shiva Krishna Merla <[email protected]>
Date: Tue Nov 4 12:48:00 2025 -0800
Merge pull request NVIDIA#711 from shivamerla/add_gpu_stress_tests
tests: Add separate targets for GPU plugin tests + add stress tests
commit 3babfe5
Author: Shiva Krishna, Merla <[email protected]>
Date: Tue Nov 4 11:47:01 2025 -0800
tests: Use BATS_TEST_TMPDIR and failfast on errors during cleanup
Signed-off-by: Shiva Krishna, Merla <[email protected]>
commit 2b3e70b
Author: Shiva Krishna, Merla <[email protected]>
Date: Tue Nov 4 11:07:19 2025 -0800
tests: Add separate targets for GPU plugin tests + add stress tests
* Add separate make targets to run GPU and CD specific tests
* Add a stress test for GPU allocation
* Refactor Makefile to share common docker setup between targets
Signed-off-by: Shiva Krishna, Merla <[email protected]>
commit 59d775b
Merge: 852b56f 1e79179
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Nov 3 19:38:02 2025 +0100
Merge pull request NVIDIA#709 from jgehrcke/jp/basic-gpu-tests
tests: cover basic GPU allocation, misc improvements
commit 852b56f
Merge: 1ee1b4a e8fa8e6
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Nov 3 19:21:53 2025 +0100
Merge pull request NVIDIA#706 from Gacko/vkptt
kubelet plugins: add /opt/bin to binary search paths
commit 1ee1b4a
Merge: f4d11e3 068bb76
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Nov 3 19:10:16 2025 +0100
Merge pull request NVIDIA#710 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/distroless/cc-v3.2.1-dev
build(deps): bump nvidia/distroless/cc from v3.2.0-dev to v3.2.1-dev in /deployments/container
commit 1e79179
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Nov 1 11:44:03 2025 -0700
tests: cover basic GPU allocation
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
misc fixes
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
remove cdi spec removal again
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 068bb76
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon Nov 3 17:59:31 2025 +0000
build(deps): bump nvidia/distroless/cc in /deployments/container
Bumps nvidia/distroless/cc from v3.2.0-dev to v3.2.1-dev.
---
updated-dependencies:
- dependency-name: nvidia/distroless/cc
dependency-version: v3.2.1-dev
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
commit fcd74d1
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Nov 1 11:42:35 2025 -0700
tests: add nvmm helper
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 977f421
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Nov 1 11:42:10 2025 -0700
tests: per-user tmp dir (relevant on shared machines)
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 1c2da2c
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Nov 1 11:41:09 2025 -0700
tests: parallelize per-node state dir cleanup
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit e8fa8e6
Author: Marco Ebert <[email protected]>
Date: Wed Oct 29 09:52:34 2025 +0100
kubelet plugins: add /opt/bin to binary search paths
Signed-off-by: Marco Ebert <[email protected]>
commit f4d11e3
Merge: 89c8258 9b20929
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 29 13:09:19 2025 +0100
Merge pull request NVIDIA#707 from jgehrcke/jp/version25120
Increment version to 25.12.0-dev
commit 89c8258
Merge: a772441 de830d3
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 29 13:07:49 2025 +0100
Merge pull request NVIDIA#703 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/distroless/cc-v3.2.0-dev
build(deps): bump nvidia/distroless/cc from v3.1.13-dev to v3.2.0-dev in /deployments/container
commit a772441
Merge: 7f591c2 2a2eeec
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 29 13:07:04 2025 +0100
Merge pull request NVIDIA#705 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/nvidia-container-toolkit-1.18.0
build(deps): bump github.com/NVIDIA/nvidia-container-toolkit from 1.18.0-rc.6 to 1.18.0
commit 9b20929
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 29 12:47:26 2025 +0100
Increment version to 25.12.0-dev
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 2a2eeec
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun Oct 26 17:02:01 2025 +0000
build(deps): bump github.com/NVIDIA/nvidia-container-toolkit
Bumps [github.com/NVIDIA/nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) from 1.18.0-rc.6 to 1.18.0.
- [Release notes](https://github.com/NVIDIA/nvidia-container-toolkit/releases)
- [Changelog](https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/CHANGELOG.md)
- [Commits](NVIDIA/nvidia-container-toolkit@v1.18.0-rc.6...v1.18.0)
---
updated-dependencies:
- dependency-name: github.com/NVIDIA/nvidia-container-toolkit
dependency-version: 1.18.0
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <[email protected]>
commit de830d3
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri Oct 24 17:13:23 2025 +0000
build(deps): bump nvidia/distroless/cc in /deployments/container
Bumps nvidia/distroless/cc from v3.1.13-dev to v3.2.0-dev.
---
updated-dependencies:
- dependency-name: nvidia/distroless/cc
dependency-version: v3.2.0-dev
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 7f591c2
Merge: cfe35ff 70fbda6
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 22 16:56:21 2025 +0200
Merge pull request NVIDIA#699 from jgehrcke/jp/readme-installation-instruction
README: refer to external install instructions
commit 70fbda6
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 21 14:26:18 2025 +0200
README: refer to external install instructions
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit cfe35ff
Merge: 2762688 151c766
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 17 14:56:42 2025 +0200
Merge pull request NVIDIA#687 from jgehrcke/jp/unbreak-ci
ci: fix downstream pipeline issues
commit 151c766
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 17 14:26:43 2025 +0200
ci: bump regctl conservatively
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 7238e5d
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 17 14:26:24 2025 +0200
ci: rename gl pipeline stages
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 87b7915
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Sep 3 17:04:33 2025 +0200
ci: push image w/o version prefix
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 24e765d
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 17 12:27:52 2025 +0200
ci: remove scan-images step
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 2762688
Merge: 1516ec7 784ba18
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 16 20:12:17 2025 +0200
Merge pull request NVIDIA#685 from jgehrcke/jp/tests-v1-exactly
tests: construct ResourceClaim differently on v1
commit 784ba18
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 16 17:30:44 2025 +0000
tests: construct ResourceClaim differently on v1
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 1516ec7
Merge: 38b42bb e14beed
Author: Shiva Krishna Merla <[email protected]>
Date: Thu Oct 16 10:01:55 2025 -0700
Merge pull request NVIDIA#682 from shivamerla/fix_attestations
Ensure attestation parameters are passed only for multi-arch builds using buildx.
commit 38b42bb
Merge: 0d83254 6cef363
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 16 11:27:20 2025 +0200
Merge pull request NVIDIA#679 from jgehrcke/jp/tests-split-into-modules-add-failover
tests: split into modules, add CD failover coverage
commit 6cef363
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 16 08:53:23 2025 +0000
tests: explicit log on launcher container start, misc
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit db70cd7
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 15:27:53 2025 +0000
tests: add test_cd_failover.bats and support
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 38036ac
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 15:16:21 2025 +0000
tests: split tests.bats into modules
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit e14beed
Author: Shiva Krishna, Merla <[email protected]>
Date: Wed Oct 15 11:52:42 2025 -0700
Ensure attestation parameters are passed only for multi-arch builds using buildx.
Signed-off-by: Shiva Krishna, Merla <[email protected]>
commit 0d83254
Merge: 65cd2c5 f8ace2e
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 18:06:27 2025 +0200
Merge pull request NVIDIA#676 from jgehrcke/jp/curl-retry-tcp-rst
build: retry TCP RST when curling bash source
commit 65cd2c5
Merge: b3f4e07 c40b44b
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 16:59:35 2025 +0200
Merge pull request NVIDIA#677 from jgehrcke/jp/test-abort-on-failure
tests: abort suite on first failure, misc
commit c40b44b
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 11:24:06 2025 +0000
tests: adjust readme
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 6e783bf
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 10:57:25 2025 +0000
tests: rundir in /tmp (too much cruft in home dir)
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit dafa4f5
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 11:15:32 2025 +0000
tests: merge two simple tests into one
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit c14c2ef
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 11:14:09 2025 +0000
tests: add on_failure hook to emit debug info
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 89bb88a
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 11:12:22 2025 +0000
tests: use new --abort flag for bats (fail suite fast)
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit f8ace2e
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 10:42:05 2025 +0000
build: retry TCP RST when curling bash source
Error seen:
curl: (7) Failed to connect to mirror.cs.odu.edu port 443 after 306 ms: Connection refused
By default, a TCP connection rejection (RST) is not treated
by curl as a transient error, see
https://curl.se/docs/manpage.html#--retry-connrefused
It's a transient error in the sense that it's often
a way to implement backpressure. We retry at slow rate.
`--retry-all-errors` is what we want here, it includes
`--retry-connrefused`.
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit b3f4e07
Merge: ab5a2b3 4e5cdf2
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 15 12:31:08 2025 +0200
Merge pull request NVIDIA#669 from NVIDIA/dependabot/go_modules/main/google.golang.org/grpc-1.76.0
build(deps): bump google.golang.org/grpc from 1.75.1 to 1.76.0
commit ab5a2b3
Merge: 23ccbd2 803a35a
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 14 19:56:45 2025 +0200
Merge pull request NVIDIA#675 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.25.3
build(deps): bump golang from 1.25.2 to 1.25.3 in /deployments/devel
commit 803a35a
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue Oct 14 17:16:27 2025 +0000
build(deps): bump golang from 1.25.2 to 1.25.3 in /deployments/devel
Bumps golang from 1.25.2 to 1.25.3.
---
updated-dependencies:
- dependency-name: golang
dependency-version: 1.25.3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 23ccbd2
Merge: 83b8249 9d02cea
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 14 17:30:58 2025 +0200
Merge pull request NVIDIA#672 from jgehrcke/jp/periodic-cleanup-partially-prepared-rcs
CD kubelet plugin: add state reconciliation for partially prepared claims
commit 9d02cea
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Oct 13 13:03:11 2025 +0000
tests: cover cleanup for stale partially prepared claims
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit f7a3310
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sun Oct 12 22:06:38 2025 +0000
CD plugin: handle stale partially prepared claims
Add a fundamentally required state reconciliation:
Periodically, perform a self-initiated Unprepare() of previously
partially prepared claims.
Perform periodically:
- Read checkpoint
- Iterate through RCs in PrepareStarted state
- For each: RC still known in API server?
If not:
1) initiate an Unprepare
2) Remove from checkpoint file if unprepr was successful
Relevance:
Unpreparing any partially performed claim preparation might revert
a state mutation that would otherwise be permanently inconsistent with
API server state (e.g., this could remove a node label).
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 83b8249
Merge: 5235bed e22cdba
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 14 15:11:35 2025 +0200
Merge pull request NVIDIA#674 from jgehrcke/jp/use-custom-config-dir-for-daemon
CD daemon: /imexd instead of /etc/nvidia-imex
commit e22cdba
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 14 07:15:09 2025 +0000
CD daemon: /imexd instead of /etc/nvidia-imex
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 5235bed
Merge: 7b5e2cd aa15924
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 14 12:50:05 2025 +0200
Merge pull request NVIDIA#658 from jgehrcke/jp/log-full-component-config-on-startup
Log full startup config in all CLIs in `Before` hook
commit aa15924
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 21:09:36 2025 +0000
tests: confirm startup config logged on lvl 0
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit e2ea590
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Sep 29 13:24:00 2025 +0000
Introduce LogStartupConfig(), use in all CLIs in Before() hook
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 4e5cdf2
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon Oct 13 08:53:56 2025 +0000
build(deps): bump google.golang.org/grpc from 1.75.1 to 1.76.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.75.1 to 1.76.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.75.1...v1.76.0)
---
updated-dependencies:
- dependency-name: google.golang.org/grpc
dependency-version: 1.76.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 7b5e2cd
Merge: a1d2fd7 11f6c02
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Oct 13 10:37:08 2025 +0200
Merge pull request NVIDIA#670 from NVIDIA/dependabot/go_modules/main/golang.org/x/time-0.14.0
build(deps): bump golang.org/x/time from 0.9.0 to 0.14.0
commit a1d2fd7
Merge: c614e61 6b2af09
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Mon Oct 13 10:32:50 2025 +0200
Merge pull request NVIDIA#671 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/nvidia-container-toolkit-1.18.0-rc.6
build(deps): bump github.com/NVIDIA/nvidia-container-toolkit from 1.18.0-rc.5 to 1.18.0-rc.6
commit 6b2af09
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun Oct 12 17:02:23 2025 +0000
build(deps): bump github.com/NVIDIA/nvidia-container-toolkit
Bumps [github.com/NVIDIA/nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) from 1.18.0-rc.5 to 1.18.0-rc.6.
- [Release notes](https://github.com/NVIDIA/nvidia-container-toolkit/releases)
- [Changelog](https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/CHANGELOG.md)
- [Commits](NVIDIA/nvidia-container-toolkit@v1.18.0-rc.5...v1.18.0-rc.6)
---
updated-dependencies:
- dependency-name: github.com/NVIDIA/nvidia-container-toolkit
dependency-version: 1.18.0-rc.6
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 11f6c02
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun Oct 12 17:02:18 2025 +0000
build(deps): bump golang.org/x/time from 0.9.0 to 0.14.0
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.9.0 to 0.14.0.
- [Commits](golang/time@v0.9.0...v0.14.0)
---
updated-dependencies:
- dependency-name: golang.org/x/time
dependency-version: 0.14.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <[email protected]>
commit c614e61
Merge: a79a9fd 4ced422
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 16:54:20 2025 +0200
Merge pull request NVIDIA#633 from jgehrcke/jp/verbosity-vs-debuggability-improvements
Add `logVerbosity` Helm chart parameter, reduce default log verbosity
commit 4ced422
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 14:45:52 2025 +0000
Remove newline, document env-based log verb flip
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 4cf3d9b
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 17:57:00 2025 +0000
Fix a typo in an error message
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 2c943f7
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 13:48:35 2025 +0000
tests: remove sinful duplicate env strategy
This also had a side effect on subsequent tests, with the
controller starting with _no_ LOG_VERBOSITY environment variable
set. I don't understand that, but that must be a funky Helm-ism.
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 3d5c51f
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 13:01:21 2025 +0000
tests: fix: wait for controller flip
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit b172342
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 12:21:07 2025 +0000
tests: replace hard-coded sleep with dynamic wait
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 9748095
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 19:36:24 2025 +0000
tests: cover CD daemon log levels
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 4767092
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 11:38:40 2025 +0000
Helm logVerbosity param: add docs, start building tests
Helm values.yaml: defaultLogVerbosity incl. docs
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
values.yaml: tweak, based on in log level insights
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
improve helm chart artifact commentary
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
squash: tweak docs
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Rename chart var, start building tests
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
tests: cover log verbosity set per-component via env
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
helm: rename defaultLogVerbosity to logVerbosity
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 3828da9
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 18:28:50 2025 +0000
CD daemon: change verbosity of "wait for nodes update" message
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 6d35ac1
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 17:56:31 2025 +0000
CD controller: make CD daemon verbosity a required arg
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 84530ab
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 14:07:33 2025 +0000
CD controller: log manager config on startup
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit bb16c33
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 11:31:36 2025 +0000
CD controller/plugins/daemon: introduce LOG_VERBOSITY
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 7e89b22
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 11:29:46 2025 +0000
CD controller: introduce LOG_VERBOSITY_CD_DAEMON
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit c5b147b
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 14:10:33 2025 +0000
tests: add note about instability around chart flip
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 4cc705a
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 11:46:10 2025 +0000
Helm: expose kubelet plugin env via chart variables
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 5f143b2
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 15:53:07 2025 +0000
Upper-case log msg, no explicit verb 0
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 8321983
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Sep 30 12:16:17 2025 +0000
Change log message levels according to new system
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit a36e214
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Sep 30 12:12:38 2025 +0000
Add logVerbosity Helm chart parameter
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit a79a9fd
Merge: 3903df7 6e56823
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Sat Oct 11 13:43:17 2025 +0200
Merge pull request NVIDIA#646 from jgehrcke/jp/no-clique-update-cd-node-status
Release workload on a non-MNNVL node in a CD
commit 6e56823
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 19:47:48 2025 +0000
CD plugin: move CDI edit gen into computeDomainDaemonSettings
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
make diff smaller, rename func
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit f7e4a45
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 16:30:27 2025 +0000
CD daemon: always mount in IMEX daemon config files
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
CD plugin: always prepare IMEX config on the host and mount it in
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit c040429
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 15:59:07 2025 +0000
Fix typos in comments and log message
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit deccb4d
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 11:39:33 2025 +0000
CD plugin: always inject CD details via CDI
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Rename 'domain' to 'domainID'
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
squash: review feedback
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
shorten comment
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 023e7f9
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 11:38:23 2025 +0000
Enrich error message with CD detail when CD not found
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 32180ad
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 11:37:43 2025 +0000
CD daemon: unconditionally write IMEX daemon config
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
Break out of select/case, MkdirAll() before writing file
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 13df4da
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 09:58:50 2025 +0000
CD daemon: init node status as NotReady, misc log msg & comment tweaks
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 3cbd5a4
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 09:55:47 2025 +0000
CD daemon: keep business logic in no-IMEX-daemon noop mode
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit fffcea2
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 09:50:30 2025 +0000
Introduce maxNodesPerIMEXDomain special case for empty cliqueID
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit e0b8990
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 09:49:07 2025 +0000
Update code comments
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 3903df7
Merge: 14dc9fe 72e39e9
Author: Kevin Klues <[email protected]>
Date: Fri Oct 10 13:22:31 2025 +0200
Merge pull request NVIDIA#661 from jgehrcke/jp/flush-logs-on-shutdown
Flush logs in CLI app `After` hook
commit 14dc9fe
Merge: 8788dd1 d34a12f
Author: Kevin Klues <[email protected]>
Date: Fri Oct 10 13:16:53 2025 +0200
Merge pull request NVIDIA#656 from jgehrcke/jp/custom-rate-limiting
Introduce DefaultPrepUnprepRateLimiter (less aggressive)
commit 8788dd1
Merge: 23d205f 0770c0a
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Fri Oct 10 12:43:33 2025 +0200
Merge pull request NVIDIA#666 from klueska/rbac-update
Separate controller and kubeletplugin into separate RBAC permissions
commit 0770c0a
Author: Kevin Klues <[email protected]>
Date: Thu Oct 9 13:41:03 2025 +0000
Separate controller and kubeletplugin into separate RBAC permissions
Signed-off-by: Kevin Klues <[email protected]>
commit 23d205f
Merge: fca1c08 816c7a1
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 10:01:28 2025 +0200
Merge pull request NVIDIA#664 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/distroless/cc-v3.1.13-dev
build(deps): bump nvidia/distroless/cc from v3.1.12-dev to v3.1.13-dev in /deployments/container
commit fca1c08
Merge: e089759 b15d633
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Thu Oct 9 09:56:21 2025 +0200
Merge pull request NVIDIA#665 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.25.2
build(deps): bump golang from 1.25.1 to 1.25.2 in /deployments/devel
commit b15d633
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Wed Oct 8 17:15:07 2025 +0000
build(deps): bump golang from 1.25.1 to 1.25.2 in /deployments/devel
Bumps golang from 1.25.1 to 1.25.2.
---
updated-dependencies:
- dependency-name: golang
dependency-version: 1.25.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 816c7a1
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Wed Oct 8 17:15:03 2025 +0000
build(deps): bump nvidia/distroless/cc in /deployments/container
Bumps nvidia/distroless/cc from v3.1.12-dev to v3.1.13-dev.
---
updated-dependencies:
- dependency-name: nvidia/distroless/cc
dependency-version: v3.1.13-dev
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <[email protected]>
commit 72e39e9
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Sep 30 09:42:57 2025 +0000
Flush logs in CLI app `After` hook
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit d34a12f
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 8 15:18:53 2025 +0200
Adjust go.mod to recent changes
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 7e18c33
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Sep 30 12:18:41 2025 +0000
Introduce DefaultPrepUnprepRateLimiter (less aggressive)
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit e089759
Merge: 765892d e9f647e
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Wed Oct 8 13:09:32 2025 +0200
Merge pull request NVIDIA#651 from jgehrcke/jp/issue-694
CD daemon: coordinate CD updates on shutdown via mutation cache
commit e9f647e
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 17:55:26 2025 +0000
tests: cover CD daemon cleanup-on-shutdown
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 980a6a1
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 17:06:42 2025 +0000
CD daemon: pod mngr: store UpdateStatus return value in mutation cache
This makes sure that fast incremental mutations on
the same CD object performed during shutdown are done
conflict-free (i.e., in actual, incremental fashion
using intermediate state returned by the API server).
Without this patch:
I1007 16:49:01.678050 1 podmanager.go:196] Successfully updated node gb-nvl-043-compute06 status to NotReady
E1007 16:49:01.681345 1 computedomain.go:161] Failed to remove node from ComputeDomain during shutdown: [...] \
"the object has been modified" [...]
With this patch:
I1007 16:59:55.350436 1 podmanager.go:200] Successfully updated node gb-nvl-043-compute07 status to NotReady
I1007 16:59:55.353551 1 computedomain.go:402] Successfully removed node with IP 192.168.34.153 from ComputeDomain default/imex-channel-injection
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 4b91fce
Author: Dr. Jan-Philip Gehrcke <[email protected]>
Date: Tue Oct 7 15:50:06 2025 +0000
CD daemon: coordinate CD updates on shutdown via mutationcache
Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
commit 765892d
Merge: 2b7e899 754a758
Author: Kevin Klues <[email protected]>
Date: Wed Oct 8 09:52:51 2025 +0200
Merge pull request NVIDIA#650 from NVIDIA/dependabot/github_actions/github/codeql-action-4
build(deps): bump github/codeql-action from 3 to 4
commit 754a758
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue Oct 7 17:08:56 2025 +0000
build(deps): bump github/codeql-action from 3 to 4
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3 to 4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@v3...v4)
---
updated-dependencies:
- dependency-name: github/codeql-action
dependency-version: '4'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <[email protected]>1 parent c690c3d commit 9ae4cc3
File tree
71 files changed
+44904
-221
lines changed- .github/workflows
- api/nvidia.com/resource/v1beta1
- cmd
- compute-domain-kubelet-plugin
- gpu-kubelet-plugin
- demo
- clusters/kind/scripts
- specs/quickstart
- deployments
- container
- helm/nvidia-dra-driver-gpu
- templates
- hack
- pkg/featuregates
- scripts
- tests/bats
- specs
- vendor
- github.com/NVIDIA
- go-nvlib/pkg
- nvpci
- bytes
- mmio
- pciids
- nvidia-container-toolkit/pkg/nvcdi
- golang.org/x/sys
- unix
- windows
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
71 files changed
+44904
-221
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | 41 | | |
43 | 42 | | |
44 | 43 | | |
45 | 44 | | |
46 | 45 | | |
47 | | - | |
48 | 46 | | |
49 | 47 | | |
50 | 48 | | |
| |||
69 | 67 | | |
70 | 68 | | |
71 | 69 | | |
72 | | - | |
| 70 | + | |
73 | 71 | | |
74 | 72 | | |
75 | 73 | | |
| |||
146 | 144 | | |
147 | 145 | | |
148 | 146 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | 147 | | |
185 | 148 | | |
186 | 149 | | |
187 | 150 | | |
188 | 151 | | |
189 | | - | |
190 | | - | |
191 | 152 | | |
192 | 153 | | |
193 | 154 | | |
| |||
204 | 165 | | |
205 | 166 | | |
206 | 167 | | |
207 | | - | |
208 | 168 | | |
209 | 169 | | |
210 | 170 | | |
| |||
254 | 214 | | |
255 | 215 | | |
256 | 216 | | |
257 | | - | |
| 217 | + | |
258 | 218 | | |
259 | 219 | | |
260 | 220 | | |
261 | 221 | | |
262 | 222 | | |
263 | 223 | | |
264 | 224 | | |
265 | | - | |
| 225 | + | |
266 | 226 | | |
267 | 227 | | |
268 | 228 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
224 | | - | |
| 224 | + | |
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
| 45 | + | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | 48 | | |
66 | 49 | | |
67 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| 70 | + | |
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
0 commit comments