Skip to content

Conversation

@ArangoGutierrez
Copy link
Collaborator

No description provided.

@ArangoGutierrez ArangoGutierrez self-assigned this Sep 8, 2025

This comment was marked as outdated.

@coveralls
Copy link

coveralls commented Sep 8, 2025

Pull Request Test Coverage Report for Build 17802380714

Details

  • 0 of 1 (0.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.8%) to 36.711%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go 0 1 0.0%
Totals Coverage Status
Change from base Build 17609015556: 0.8%
Covered Lines: 4800
Relevant Lines: 13075

💛 - Coveralls

This comment was marked as outdated.

@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review September 9, 2025 09:31

// TestUpdateV2Config_NoConfigFile tests the scenario when there is no
// containerd config file present
func TestUpdateV2Config_NoConfigFile(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed:

I think these tests are testing the wrong thing. Although we may want to consider updating the tests for Update config, what we really want is tests to ensure the correct behaviour of the Setup->Cleanup cycle.

Would adapting the proposed implementation from the last two points above be feasible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for the clarification. I now understand that I got the wrong idea from our conversation. I'll work on the proposed implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the tests following the structure of the comment #1272 (comment) , PTAL

@ArangoGutierrez ArangoGutierrez force-pushed the update_ctk_unit_test branch 3 times, most recently from d3cc519 to 5ba3f1f Compare September 9, 2025 12:20
@ArangoGutierrez ArangoGutierrez force-pushed the update_ctk_unit_test branch 3 times, most recently from 02f2ba4 to cae823b Compare September 9, 2025 12:34

This comment was marked as outdated.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the containerd unit tests to follow a comprehensive Setup->Cleanup lifecycle testing pattern. It replaces granular unit tests with integration-style tests that verify the complete workflow of configuring and then cleaning up NVIDIA container runtime configurations.

  • Replaces existing granular unit tests with comprehensive Setup->Cleanup lifecycle tests
  • Adds a new CRI-O test file with complete test coverage for both config and hook modes
  • Updates copyright year in config_v2_test.go from 2021 to 2025

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
cmd/nvidia-ctk-installer/container/runtime/crio/config_test.go New comprehensive test file covering CRI-O Setup->Cleanup lifecycle for both config and hook modes
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go Fixed typo: defaultRuntmeType to defaultRuntimeType
cmd/nvidia-ctk-installer/container/runtime/containerd/config_v2_test.go Refactored from granular unit tests to comprehensive Setup->Cleanup lifecycle tests; updated copyright year
cmd/nvidia-ctk-installer/container/runtime/containerd/config_v1_test.go Added missing constants that were removed from config_v2_test.go
Comments suppressed due to low confidence (2)

cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1

  • Line 38 contains a typo defaultRuntmeType which should be defaultRuntimeType. This appears to be dead code since line 39 declares the correct constant name.
/**

cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1

  • Line 65 references the misspelled constant defaultRuntmeType which should be defaultRuntimeType. This creates inconsistency and potential runtime issues.
/**

configPath := filepath.Join(testRoot, "etc/containerd/config.toml")
require.NoError(t, os.MkdirAll(filepath.Dir(configPath), 0755))

cfg, err := containerd.New(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would this be cleaner to read if we were using a raw string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to raw string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was the case for checking the OUTPUT, I think there was a misunderstanding here. I mean that we define the CONTENT of the input file explicitly and no as any specific type. For example:

diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..daa5d96f 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -92,29 +92,24 @@ func TestContainerdConfigLifecycle(t *testing.T) {
 				configPath := filepath.Join(testRoot, "etc/containerd/config.toml")
 				require.NoError(t, os.MkdirAll(filepath.Dir(configPath), 0755))
 
-				cfg, err := containerd.New(
-					containerd.WithConfigSource(toml.FromMap(map[string]interface{}{
-						"version": int64(2),
-						"plugins": map[string]interface{}{
-							"io.containerd.grpc.v1.cri": map[string]interface{}{
-								"containerd": map[string]interface{}{
-									"default_runtime_name": "runc",
-									"runtimes": map[string]interface{}{
-										"runc": map[string]interface{}{
-											"runtime_type": "io.containerd.runc.v2",
-											"options": map[string]interface{}{
-												"BinaryName": "/usr/bin/runc",
-											},
-										},
-									},
-								},
-							},
-						},
-					})),
-				)
-				require.NoError(t, err)
-				_, err = cfg.Save(configPath)
-				require.NoError(t, err)
+				contents := `version = 2
+
+				[plugins]
+
+				  [plugins."io.containerd.grpc.v1.cri"]
+
+					[plugins."io.containerd.grpc.v1.cri".containerd]
+					   default_runtime_name = "runc"
+					  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
+
+						[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+						  runtime_type = "io.containerd.runc.v2"
+
+						  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+							BinaryName = "/usr/bin/runc"
+				`
+
+				require.NoError(t, os.WriteFile(configPath, []byte(contents), 0755))
 				return nil
 			},
 			assertSetupPostConditions: func(t *testing.T, testRoot string) error {

runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
},
// TestSetupCleanup tests the complete Setup->Cleanup lifecycle following Evan Lezar's pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rather give the function a more expressive name than relying on a comment in this case. (This should only be required if we feel TestSetupCleanup is not sufficient).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test func renamed

Comment on lines 68 to 70
// Verify that nvidia runtimes were removed
if _, err := os.Stat(filepath.Join(testRoot, "etc/containerd/config.toml")); err == nil {
verifyRuntimesAbsent(t, filepath.Join(testRoot, "etc/containerd/config.toml"),
"nvidia", "nvidia-cdi", "nvidia-legacy")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not be checking for AN EMPTY CONFIG here? (actually, I believe the expected behaviour is that the config file is deleted).

Copy link
Collaborator Author

@ArangoGutierrez ArangoGutierrez Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use NoFileExist check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange. We should be deleting the file if -- after removing nvidia runtimes -- there is only a version entry in the config. (We remove the version entry here and then remove the output file when writing the raw toml).

This is even more reason to CHECK RAW STRINGS so that we know the complete state of the config file we generate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have made the following changes to the test:

diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..03bf60d9 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -68,11 +68,7 @@ func TestContainerdConfigLifecycle(t *testing.T) {
 			assertCleanupPostConditions: func(t *testing.T, testRoot string) error {
 				configPath := filepath.Join(testRoot, "etc/containerd/config.toml")
 
-				if _, err := os.Stat(configPath); err == nil {
-					// TODO(@ArangoGutierrez): When moving to DropIn files, we should verify that the config file is deleted after cleanup.
-					// For now, we just verify that the runtimes are absent.
-					verifyRuntimesAbsent(t, configPath, "nvidia", "nvidia-cdi", "nvidia-legacy")
-				}
+				require.NoFileExists(t, configPath)
 				return nil
 			},
 		},

(Note that I check file not exists)

and the test still passes. Could you please elaborate on what you based the statement:

EDIT: the behaviour WILL be to delete the file, but currently it is not , see https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/pkg/config/engine/containerd/config.go#L127 , the logic to delete the file is introduced on the DropIns PR at https://github.com/NVIDIA/nvidia-container-toolkit/pull/1272/files#diff-59fededabdcd1a2abf2309e2d74e17b98c379747eb6fa2bdff7a24edffab8c14R129

on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run those changes on a Mac or a Linux machine?
I had it like you showed before, and it passed on my Mac, but then it failed on the GitHub Runner (Linux)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that on the github runner we're generating the config from containerd config dump. This should be avoided unless we explicitly "mock" this command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, so given this, do you think we should have that as a follow-up, and on this PR assume the file is not empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We should check the file contents now. Note that if we make the following change:

diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..62dc4bb4 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -52,11 +52,12 @@ func TestContainerdConfigLifecycle(t *testing.T) {
 			description:   "v2: top-level config does not exist",
 			configVersion: 2,
 			containerOptions: container.Options{
-				Config:       "{{ .testRoot }}/etc/containerd/config.toml",
-				RuntimeName:  "nvidia",
-				RuntimeDir:   "/usr/bin",
-				SetAsDefault: false,
-				RestartMode:  "none",
+				Config:         "{{ .testRoot }}/etc/containerd/config.toml",
+				RuntimeName:    "nvidia",
+				RuntimeDir:     "/usr/bin",
+				SetAsDefault:   false,
+				RestartMode:    "none",
+				ExecutablePath: "not-containerd",
 			},
 			options: Options{},
 			assertSetupPostConditions: func(t *testing.T, testRoot string) error {

then we would force the command-line source to fail since not-containerd will not exist.

The alternative is to get #1251 in for more explicit control.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I have added ExecutablePath and it works now

containerd.WithConfigSource(toml.FromFile(configPath)),
)
require.NoError(t, err)
c := cfg.(*containerd.Config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis leaks implementation details. As mentioned in our discussion, these tests were never properly updated when we migrated the logic to a central package for use from the nvidia-ctk runtime configure command. We should check the content of the files that we're interested in.

t.Run(tc.description, func(t *testing.T) {
// Update any paths as required
testRoot := t.TempDir()
tc.containerOptions.Config = strings.ReplaceAll(tc.containerOptions.Config, "{{ .testRoot }}", testRoot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR, but we could consider JSON marshalling, replacing the string, and then unmarshalling again. For a single field this is fine though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO @ArangoGutierrez, let's add JSON marshaling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to only focus on the v2 tests as part of this PR, but we could consider also also including v1 tests there. (If we move to testing the Setup->Cleanup cycle, then we don't need to distinguish between v1 and v2 tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged v1 and v2 now that we test focusing on the lifecycle

Comment on lines 297 to 300
// Check non-runtime settings
require.Equal(t, "/usr/libexec/crio/conmon", c.GetPath([]string{"crio", "runtime", "conmon"}))
require.Equal(t, true, c.GetPath([]string{"crio", "runtime", "selinux"}))
require.Equal(t, "/etc/cni/net.d/", c.GetPath([]string{"crio", "network", "network_dir"}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just compare raw strings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to raw string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you perhaps forget to push this change? Where did you change to checking raw strings? To be clear, what I mean by "raw strings" is that the test will contain a raw (i.e. backtick) string containig the expected contents of the config file in the various steps. We already do this for the CDI specs we generated, for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test now with raw strings

}

// TestSetupCleanupHookMode tests the Setup->Cleanup lifecycle for hook mode
func TestSetupCleanupHookMode(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this needs to be a separate function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged into the main test func

@ArangoGutierrez ArangoGutierrez force-pushed the update_ctk_unit_test branch 3 times, most recently from c05a1f9 to b298d61 Compare September 16, 2025 11:53
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = "/usr/bin/runc"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, we're actually interested in the runtime options from the default_runtime_name, runc, and crun.

[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
# Note: No default_runtime_name specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Note: No default_runtime_name specified

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can remove this comment in the file since we already have a code comment above.

Comment on lines 904 to 905
// Verify that default_runtime_name WAS added
require.Contains(t, string(actual), `default_runtime_name = "nvidia"`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already doing a FULL STRING MATCH on the config. Why is this required?

Suggested change
// Verify that default_runtime_name WAS added
require.Contains(t, string(actual), `default_runtime_name = "nvidia"`)

Comment on lines 933 to 935
// Verify that default_runtime_name was removed to restore original state
require.NotContains(t, string(actual), "default_runtime_name")
require.NotContains(t, string(actual), "nvidia")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already doing a full string match. Why is this required?

Suggested change
// Verify that default_runtime_name was removed to restore original state
require.NotContains(t, string(actual), "default_runtime_name")
require.NotContains(t, string(actual), "nvidia")
// Verify that default_runtime_name was removed to restore original state
require.NotContains(t, string(actual), "default_runtime_name")
require.NotContains(t, string(actual), "nvidia")


testCases := []struct {
description string
configVersion int // 0 for auto-detect, 1 for v1, 2 for v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about v3?

testCases := []struct {
description string
configVersion int // 0 for auto-detect, 1 for v1, 2 for v2
useLegacyConfig bool // for v1 legacy behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an additional setting in this struct if we're just setting options.useLegacyConfig with this? We only need this for the deprecated v1 config structure, so having it a top-level test var seems out of place.

Comment on lines 959 to 1468
if tc.options.runtimeType == "" {
tc.options.runtimeType = "io.containerd.runc.v2"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we have any tests where we actually set this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as a nit: move the setup to before we actually start running things.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

actual, err := os.ReadFile(configPath)
require.NoError(t, err)

expected := `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really insert a newline at the start of the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if not the test fails with Error: Not equal:

Comment on lines 481 to 485
// Parse both JSONs to compare them semantically (ignoring formatting differences)
var expectedJSON, actualJSON interface{}
require.NoError(t, json.Unmarshal([]byte(expected), &expectedJSON))
require.NoError(t, json.Unmarshal(actual, &actualJSON))
require.Equal(t, expectedJSON, actualJSON)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is NOT required. We're already checking for exactly matching contents. Is there a reason to expect that these would unmarshal differently?

Comment on lines 585 to 592
} else if tc.options.configMode == "hook" {
// Ensure hooks directory exists for hook mode
hooksDir := tc.options.hooksDir
require.NoError(t, os.MkdirAll(hooksDir, 0755))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not just implemented in prepareEnvironment? Also, we explicitly create the hook dir as part of the installation, so this should not be reqiured in any case.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1

  • Variable name contains a typo: 'defaultRuntmeType' should be 'defaultRuntimeType'. This appears to be the old misspelled constant being replaced.
/**

@elezar elezar merged commit 470b841 into NVIDIA:main Sep 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants