-
Notifications
You must be signed in to change notification settings - Fork 435
[no-relnote] Refactor cmd/installer/container/containerd unit test #1293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 17802380714Details
💛 - Coveralls |
dad6557 to
b5105a9
Compare
76d8c46 to
19c7b36
Compare
|
|
||
| // TestUpdateV2Config_NoConfigFile tests the scenario when there is no | ||
| // containerd config file present | ||
| func TestUpdateV2Config_NoConfigFile(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed:
- offline in our call yesterday
- Implement Drop-In Lifecycle Support for Containerd and Crio #1272 (comment)
- https://github.com/NVIDIA/nvidia-container-toolkit/pull/1280/files#diff-743eab44a71e6e333fef0be742b20bcdb7188341b5b8e6542bc94ec64704de35R35
I think these tests are testing the wrong thing. Although we may want to consider updating the tests for Update config, what we really want is tests to ensure the correct behaviour of the Setup->Cleanup cycle.
Would adapting the proposed implementation from the last two points above be feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks for the clarification. I now understand that I got the wrong idea from our conversation. I'll work on the proposed implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the tests following the structure of the comment #1272 (comment) , PTAL
d3cc519 to
5ba3f1f
Compare
02f2ba4 to
cae823b
Compare
cae823b to
8068032
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the containerd unit tests to follow a comprehensive Setup->Cleanup lifecycle testing pattern. It replaces granular unit tests with integration-style tests that verify the complete workflow of configuring and then cleaning up NVIDIA container runtime configurations.
- Replaces existing granular unit tests with comprehensive Setup->Cleanup lifecycle tests
- Adds a new CRI-O test file with complete test coverage for both config and hook modes
- Updates copyright year in config_v2_test.go from 2021 to 2025
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cmd/nvidia-ctk-installer/container/runtime/crio/config_test.go | New comprehensive test file covering CRI-O Setup->Cleanup lifecycle for both config and hook modes |
| cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go | Fixed typo: defaultRuntmeType to defaultRuntimeType |
| cmd/nvidia-ctk-installer/container/runtime/containerd/config_v2_test.go | Refactored from granular unit tests to comprehensive Setup->Cleanup lifecycle tests; updated copyright year |
| cmd/nvidia-ctk-installer/container/runtime/containerd/config_v1_test.go | Added missing constants that were removed from config_v2_test.go |
Comments suppressed due to low confidence (2)
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1
- Line 38 contains a typo
defaultRuntmeTypewhich should bedefaultRuntimeType. This appears to be dead code since line 39 declares the correct constant name.
/**
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1
- Line 65 references the misspelled constant
defaultRuntmeTypewhich should bedefaultRuntimeType. This creates inconsistency and potential runtime issues.
/**
| configPath := filepath.Join(testRoot, "etc/containerd/config.toml") | ||
| require.NoError(t, os.MkdirAll(filepath.Dir(configPath), 0755)) | ||
|
|
||
| cfg, err := containerd.New( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Would this be cleaner to read if we were using a raw string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to raw string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was the case for checking the OUTPUT, I think there was a misunderstanding here. I mean that we define the CONTENT of the input file explicitly and no as any specific type. For example:
diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..daa5d96f 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -92,29 +92,24 @@ func TestContainerdConfigLifecycle(t *testing.T) {
configPath := filepath.Join(testRoot, "etc/containerd/config.toml")
require.NoError(t, os.MkdirAll(filepath.Dir(configPath), 0755))
- cfg, err := containerd.New(
- containerd.WithConfigSource(toml.FromMap(map[string]interface{}{
- "version": int64(2),
- "plugins": map[string]interface{}{
- "io.containerd.grpc.v1.cri": map[string]interface{}{
- "containerd": map[string]interface{}{
- "default_runtime_name": "runc",
- "runtimes": map[string]interface{}{
- "runc": map[string]interface{}{
- "runtime_type": "io.containerd.runc.v2",
- "options": map[string]interface{}{
- "BinaryName": "/usr/bin/runc",
- },
- },
- },
- },
- },
- },
- })),
- )
- require.NoError(t, err)
- _, err = cfg.Save(configPath)
- require.NoError(t, err)
+ contents := `version = 2
+
+ [plugins]
+
+ [plugins."io.containerd.grpc.v1.cri"]
+
+ [plugins."io.containerd.grpc.v1.cri".containerd]
+ default_runtime_name = "runc"
+ [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
+
+ [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+ runtime_type = "io.containerd.runc.v2"
+
+ [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+ BinaryName = "/usr/bin/runc"
+ `
+
+ require.NoError(t, os.WriteFile(configPath, []byte(contents), 0755))
return nil
},
assertSetupPostConditions: func(t *testing.T, testRoot string) error {
| runtimeName: "NAME", | ||
| expectedDefaultRuntimeName: "NAME", | ||
| }, | ||
| // TestSetupCleanup tests the complete Setup->Cleanup lifecycle following Evan Lezar's pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rather give the function a more expressive name than relying on a comment in this case. (This should only be required if we feel TestSetupCleanup is not sufficient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test func renamed
| // Verify that nvidia runtimes were removed | ||
| if _, err := os.Stat(filepath.Join(testRoot, "etc/containerd/config.toml")); err == nil { | ||
| verifyRuntimesAbsent(t, filepath.Join(testRoot, "etc/containerd/config.toml"), | ||
| "nvidia", "nvidia-cdi", "nvidia-legacy") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not be checking for AN EMPTY CONFIG here? (actually, I believe the expected behaviour is that the config file is deleted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now use NoFileExist check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: the behaviour WILL be to delete the file, but currently it is not , see https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/pkg/config/engine/containerd/config.go#L127 , the logic to delete the file is introduced on the DropIns PR at https://github.com/NVIDIA/nvidia-container-toolkit/pull/1272/files#diff-59fededabdcd1a2abf2309e2d74e17b98c379747eb6fa2bdff7a24edffab8c14R129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange. We should be deleting the file if -- after removing nvidia runtimes -- there is only a version entry in the config. (We remove the version entry here and then remove the output file when writing the raw toml).
This is even more reason to CHECK RAW STRINGS so that we know the complete state of the config file we generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I have made the following changes to the test:
diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..03bf60d9 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -68,11 +68,7 @@ func TestContainerdConfigLifecycle(t *testing.T) {
assertCleanupPostConditions: func(t *testing.T, testRoot string) error {
configPath := filepath.Join(testRoot, "etc/containerd/config.toml")
- if _, err := os.Stat(configPath); err == nil {
- // TODO(@ArangoGutierrez): When moving to DropIn files, we should verify that the config file is deleted after cleanup.
- // For now, we just verify that the runtimes are absent.
- verifyRuntimesAbsent(t, configPath, "nvidia", "nvidia-cdi", "nvidia-legacy")
- }
+ require.NoFileExists(t, configPath)
return nil
},
},
(Note that I check file not exists)
and the test still passes. Could you please elaborate on what you based the statement:
EDIT: the behaviour WILL be to delete the file, but currently it is not , see https://github.com/NVIDIA/nvidia-container-toolkit/blob/main/pkg/config/engine/containerd/config.go#L127 , the logic to delete the file is introduced on the DropIns PR at https://github.com/NVIDIA/nvidia-container-toolkit/pull/1272/files#diff-59fededabdcd1a2abf2309e2d74e17b98c379747eb6fa2bdff7a24edffab8c14R129
on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run those changes on a Mac or a Linux machine?
I had it like you showed before, and it passed on my Mac, but then it failed on the GitHub Runner (Linux)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be that on the github runner we're generating the config from containerd config dump. This should be avoided unless we explicitly "mock" this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, so given this, do you think we should have that as a follow-up, and on this PR assume the file is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We should check the file contents now. Note that if we make the following change:
diff --git a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
index 5847d86f..62dc4bb4 100644
--- a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
+++ b/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go
@@ -52,11 +52,12 @@ func TestContainerdConfigLifecycle(t *testing.T) {
description: "v2: top-level config does not exist",
configVersion: 2,
containerOptions: container.Options{
- Config: "{{ .testRoot }}/etc/containerd/config.toml",
- RuntimeName: "nvidia",
- RuntimeDir: "/usr/bin",
- SetAsDefault: false,
- RestartMode: "none",
+ Config: "{{ .testRoot }}/etc/containerd/config.toml",
+ RuntimeName: "nvidia",
+ RuntimeDir: "/usr/bin",
+ SetAsDefault: false,
+ RestartMode: "none",
+ ExecutablePath: "not-containerd",
},
options: Options{},
assertSetupPostConditions: func(t *testing.T, testRoot string) error {
then we would force the command-line source to fail since not-containerd will not exist.
The alternative is to get #1251 in for more explicit control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I have added ExecutablePath and it works now
cmd/nvidia-ctk-installer/container/runtime/containerd/config_v2_test.go
Outdated
Show resolved
Hide resolved
cmd/nvidia-ctk-installer/container/runtime/containerd/config_v2_test.go
Outdated
Show resolved
Hide resolved
| containerd.WithConfigSource(toml.FromFile(configPath)), | ||
| ) | ||
| require.NoError(t, err) | ||
| c := cfg.(*containerd.Config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis leaks implementation details. As mentioned in our discussion, these tests were never properly updated when we migrated the logic to a central package for use from the nvidia-ctk runtime configure command. We should check the content of the files that we're interested in.
| t.Run(tc.description, func(t *testing.T) { | ||
| // Update any paths as required | ||
| testRoot := t.TempDir() | ||
| tc.containerOptions.Config = strings.ReplaceAll(tc.containerOptions.Config, "{{ .testRoot }}", testRoot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but we could consider JSON marshalling, replacing the string, and then unmarshalling again. For a single field this is fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO @ArangoGutierrez, let's add JSON marshaling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to only focus on the v2 tests as part of this PR, but we could consider also also including v1 tests there. (If we move to testing the Setup->Cleanup cycle, then we don't need to distinguish between v1 and v2 tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged v1 and v2 now that we test focusing on the lifecycle
| // Check non-runtime settings | ||
| require.Equal(t, "/usr/libexec/crio/conmon", c.GetPath([]string{"crio", "runtime", "conmon"})) | ||
| require.Equal(t, true, c.GetPath([]string{"crio", "runtime", "selinux"})) | ||
| require.Equal(t, "/etc/cni/net.d/", c.GetPath([]string{"crio", "network", "network_dir"})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just compare raw strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to raw string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you perhaps forget to push this change? Where did you change to checking raw strings? To be clear, what I mean by "raw strings" is that the test will contain a raw (i.e. backtick) string containig the expected contents of the config file in the various steps. We already do this for the CDI specs we generated, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test now with raw strings
| } | ||
|
|
||
| // TestSetupCleanupHookMode tests the Setup->Cleanup lifecycle for hook mode | ||
| func TestSetupCleanupHookMode(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that this needs to be a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged into the main test func
cmd/nvidia-ctk-installer/container/runtime/containerd/config_v2_test.go
Outdated
Show resolved
Hide resolved
c05a1f9 to
b298d61
Compare
| runtime_type = "io.containerd.runc.v2" | ||
| [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] | ||
| BinaryName = "/usr/bin/runc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, we're actually interested in the runtime options from the default_runtime_name, runc, and crun.
| [plugins] | ||
| [plugins."io.containerd.grpc.v1.cri"] | ||
| [plugins."io.containerd.grpc.v1.cri".containerd] | ||
| # Note: No default_runtime_name specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Note: No default_runtime_name specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can remove this comment in the file since we already have a code comment above.
| // Verify that default_runtime_name WAS added | ||
| require.Contains(t, string(actual), `default_runtime_name = "nvidia"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already doing a FULL STRING MATCH on the config. Why is this required?
| // Verify that default_runtime_name WAS added | |
| require.Contains(t, string(actual), `default_runtime_name = "nvidia"`) |
| // Verify that default_runtime_name was removed to restore original state | ||
| require.NotContains(t, string(actual), "default_runtime_name") | ||
| require.NotContains(t, string(actual), "nvidia") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already doing a full string match. Why is this required?
| // Verify that default_runtime_name was removed to restore original state | |
| require.NotContains(t, string(actual), "default_runtime_name") | |
| require.NotContains(t, string(actual), "nvidia") | |
| // Verify that default_runtime_name was removed to restore original state | |
| require.NotContains(t, string(actual), "default_runtime_name") | |
| require.NotContains(t, string(actual), "nvidia") |
|
|
||
| testCases := []struct { | ||
| description string | ||
| configVersion int // 0 for auto-detect, 1 for v1, 2 for v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about v3?
| testCases := []struct { | ||
| description string | ||
| configVersion int // 0 for auto-detect, 1 for v1, 2 for v2 | ||
| useLegacyConfig bool // for v1 legacy behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an additional setting in this struct if we're just setting options.useLegacyConfig with this? We only need this for the deprecated v1 config structure, so having it a top-level test var seems out of place.
| if tc.options.runtimeType == "" { | ||
| tc.options.runtimeType = "io.containerd.runc.v2" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we have any tests where we actually set this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as a nit: move the setup to before we actually start running things.
08d3422 to
92370c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
92370c6 to
55cf902
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
| actual, err := os.ReadFile(configPath) | ||
| require.NoError(t, err) | ||
|
|
||
| expected := ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really insert a newline at the start of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if not the test fails with Error: Not equal:
| // Parse both JSONs to compare them semantically (ignoring formatting differences) | ||
| var expectedJSON, actualJSON interface{} | ||
| require.NoError(t, json.Unmarshal([]byte(expected), &expectedJSON)) | ||
| require.NoError(t, json.Unmarshal(actual, &actualJSON)) | ||
| require.Equal(t, expectedJSON, actualJSON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is NOT required. We're already checking for exactly matching contents. Is there a reason to expect that these would unmarshal differently?
| } else if tc.options.configMode == "hook" { | ||
| // Ensure hooks directory exists for hook mode | ||
| hooksDir := tc.options.hooksDir | ||
| require.NoError(t, os.MkdirAll(hooksDir, 0755)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not just implemented in prepareEnvironment? Also, we explicitly create the hook dir as part of the installation, so this should not be reqiured in any case.
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
55cf902 to
46c5d59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go:1
- Variable name contains a typo: 'defaultRuntmeType' should be 'defaultRuntimeType'. This appears to be the old misspelled constant being replaced.
/**
Signed-off-by: Evan Lezar <[email protected]>
No description provided.