Skip to content

Conversation

@tariq1890
Copy link
Contributor

@tariq1890 tariq1890 commented Oct 23, 2024

When running the latest toolkit on an OpenShift 4.14 cluster, we noticed that the crio status config command was failing with the following error:

time="2024-10-23T00:49:22Z" level=error msg="error running nvidia-toolkit: unable to determine runtime options: unable to load crio config: failed to run command chroot [/host crio status config]: exit status 1"

On further investigation, we found that in CRIO versions/K8s Cluster older v1.28, the crio-status command was used to retrieve crio container runtime config as opposed to the status subcommand of the crio binary.

Since we need to support clusters as old as OCP 4.12 i.e cri-o v1.27, I thought it would be a good idea to add support fallback CLI commands to the tomlCLISource. This way, if crio status config fails, we can fallback to crio-status config

Also verified that the crio-status binary in versions as old as v1.16.0.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing something like we do for "discovery" or labels in GFD would be better.

We would implement composite type that itself implements the Loader interface.

type firstOf []Loader

func (loaders firstOf) Load() (*Tree, error) {
	var errs error
	for _, loader := range loaders {
		tree, err := loader.Load()
		if err != nil {
			errs = errors.Join(errs, err)
			continue
		}
		return tree, nil
	}
	return nil, errs
}

This would allow us to use the existing sources directly.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

diff --git a/pkg/config/engine/containerd/containerd.go b/pkg/config/engine/containerd/containerd.go
index 43ff44a7..a5b08810 100644
--- a/pkg/config/engine/containerd/containerd.go
+++ b/pkg/config/engine/containerd/containerd.go
@@ -128,8 +128,7 @@ func (c *Config) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
 
 // CommandLineSource returns the CLI-based containerd config loader
 func CommandLineSource(hostRoot string) toml.Loader {
-	commandLine := chrootIfRequired(hostRoot, "containerd", "config", "dump")
-	return toml.FromCommandLine(commandLine[0], commandLine[1:]...)
+	return toml.FromCommandLine(chrootIfRequired(hostRoot, "containerd", "config", "dump")...)
 }
 
 func chrootIfRequired(hostRoot string, commandLine ...string) []string {
diff --git a/pkg/config/engine/crio/crio.go b/pkg/config/engine/crio/crio.go
index ce9c37ff..3d5629d7 100644
--- a/pkg/config/engine/crio/crio.go
+++ b/pkg/config/engine/crio/crio.go
@@ -155,8 +155,10 @@ func (c *Config) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
 
 // CommandLineSource returns the CLI-based crio config loader
 func CommandLineSource(hostRoot string) toml.Loader {
-	commandLine := chrootIfRequired(hostRoot, "crio", "status", "config")
-	return toml.FromCommandLine(commandLine[0], commandLine[1:]...)
+	return toml.LoadFirst(
+		toml.FromCommandLine(chrootIfRequired(hostRoot, "crio", "status", "config")...),
+		toml.FromCommandLine(chrootIfRequired(hostRoot, "crio-status", "config")...),
+	)
 }
 
 func chrootIfRequired(hostRoot string, commandLine ...string) []string {
diff --git a/pkg/config/toml/list.go b/pkg/config/toml/list.go
new file mode 100644
index 00000000..8313cdf0
--- /dev/null
+++ b/pkg/config/toml/list.go
@@ -0,0 +1,38 @@
+/**
+# Copyright 2024 NVIDIA CORPORATION
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+**/
+
+package toml
+
+import "errors"
+
+type firstOf []Loader
+
+func LoadFirst(loaders ...Loader) Loader {
+	return firstOf(loaders)
+}
+
+func (loaders firstOf) Load() (*Tree, error) {
+	var errs error
+	for _, loader := range loaders {
+		tree, err := loader.Load()
+		if err != nil {
+			errs = errors.Join(errs, err)
+			continue
+		}
+		return tree, nil
+	}
+	return nil, errs
+}
diff --git a/pkg/config/toml/source.go b/pkg/config/toml/source.go
index 5cb0b7fe..08907764 100644
--- a/pkg/config/toml/source.go
+++ b/pkg/config/toml/source.go
@@ -36,12 +36,12 @@ func FromFile(path string) Loader {
 
 // FromCommandLine creates a TOML source from the output of a shell command and its corresponding args.
 // If the command is empty, an empty config is returned.
-func FromCommandLine(cmd string, args ...string) Loader {
-	if len(cmd) == 0 {
+func FromCommandLine(cmds ...string) Loader {
+	if len(cmds) == 0 {
 		return Empty
 	}
 	return &tomlCliSource{
-		command: cmd,
-		args:    args,
+		command: cmds[0],
+		args:    cmds[1:],
 	}
 }

@tariq1890 tariq1890 force-pushed the cli-source-fallback branch from 1c9ccb7 to af575c7 Compare October 23, 2024 14:55
@tariq1890
Copy link
Contributor Author

Thanks @elezar ! This is definitely a lot cleaner. I have updated the PR

@tariq1890 tariq1890 requested a review from elezar October 23, 2024 15:56
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tariq1890.

One question that I would have is whether it makes sense for a user to be able to configure this somehow if the defaults don't work. As it stands, there is no mechanism to fall back to the config-file based approach.

@tariq1890 tariq1890 force-pushed the cli-source-fallback branch from af575c7 to 93b5d92 Compare October 23, 2024 21:25
@tariq1890 tariq1890 force-pushed the cli-source-fallback branch from 93b5d92 to 0f7aba9 Compare October 23, 2024 21:27
@tariq1890
Copy link
Contributor Author

Addressed review comments and added @elezar as Co-author

@tariq1890 tariq1890 merged commit 771ac6b into main Oct 23, 2024
10 checks passed
@tariq1890 tariq1890 deleted the cli-source-fallback branch October 23, 2024 21:31
@tariq1890
Copy link
Contributor Author

As it stands, there is no mechanism to fall back to the config-file based approach.

I think the failure risks have been reduced significantly with introduction of the second command as fallback. In the interest of keeping it simple, I would say that it's better to not fallback to another config source. The more featires and functions we add, the more bug prone the code gets. We could revisit this if we find that failures with the CLI config source are more common than expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants