Skip to content

Conversation

@ArangoGutierrez
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez commented Aug 4, 2025

Fixes #1218

The chmod createContainer hook was added as a workaround to a specific crun issue for device nodes that are found in subdirectories of /dev. Since this has since been addressed in crun, this PR disables the chmod hook by default to reduce the number of supported hooks.

For users that require this hook, it is recommended that cdi be used and that the --enable-hook=chmod is passed to the nvidia-ctk cdi generate command.

See:

@ArangoGutierrez ArangoGutierrez added this to the v1.18.0 milestone Aug 4, 2025
@ArangoGutierrez ArangoGutierrez self-assigned this Aug 4, 2025

This comment was marked as outdated.

@coveralls
Copy link

coveralls commented Aug 7, 2025

Pull Request Test Coverage Report for Build 17094949023

Details

  • 43 of 57 (75.44%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 35.811%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/nvidia-ctk/cdi/generate/generate.go 4 6 66.67%
pkg/nvcdi/options.go 0 12 0.0%
Totals Coverage Status
Change from base Build 17078280554: 0.3%
Covered Lines: 4630
Relevant Lines: 12929

💛 - Coveralls

@ArangoGutierrez ArangoGutierrez force-pushed the i-1218 branch 2 times, most recently from f18a08a to b26ad2b Compare August 7, 2025 16:54
&cli.StringSliceFlag{
Name: "enable-hook",
Aliases: []string{"enable-hooks"},
Usage: "Explicitly enable a hook in the generated CDI specification. This can be specified multiple times.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage: "Explicitly enable a hook in the generated CDI specification. This can be specified multiple times.",
Usage: "Explicitly enable a hook in the generated CDI specification. This overrides disabled hooks. This can be specified multiple times.",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 304 to 315
if m.config != nil {
if len(opts.disabledHooks) == 0 {
if disabledHooks := m.config.Get("nvidia-container-runtime-hook.disabled"); disabledHooks != nil {
if hooks, ok := disabledHooks.([]interface{}); ok {
for _, hook := range hooks {
if h, ok := hook.(string); ok {
opts.disabledHooks = append(opts.disabledHooks, h)
}
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? I thought we added support to read settings from a config file using altsrc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Comment on lines 26 to 33
// DisabledHooks specifies the hooks that are disabled for the NVIDIA
// Container Runtime hook.
DisabledHooks []string `toml:"disabled,omitempty"`
// EnabledHooks specifies the hooks that are enabled for the NVIDIA
// Container Runtime hook.
// Specify a list of explicitly enabled hooks. If a hook is specified as
// both enabled and disabled, it will be enabled.
EnabledHooks []string `toml:"enabled,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add these config options, and IF we decide to add them we should not add them in the nvidia-container-runtime-hook section. This section is specific for the nvidia-container-runtime-hook that is used to call out to the nvidia-container-cli.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed for now, maybe a discuss-todo item

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now pushed

return func(c *cdiHookCreator) {
for _, hook := range hooks {
c.disabledHooks[hook] = true
c.hookStates[hook] = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies ordering between the calls to WithDisabledHooks and WithEnabledHooks. I would rather have the order or precedence well-defined such that hooks that are specifically ENABLED take precedence over hooks that are specifically DISABLED.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be simplified by keeping the cdiHookCreator struct as is, and introducing an options struct that we can use to set up the correct values of disabledHooks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, let me think on a proper solution for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ArangoGutierrez ArangoGutierrez requested a review from elezar August 8, 2025 13:41
@ArangoGutierrez ArangoGutierrez force-pushed the i-1218 branch 2 times, most recently from 815dae7 to 97df210 Compare August 8, 2025 14:23
Comment on lines 26 to 28
// DisabledHooks specifies the hooks that are disabled for the NVIDIA
// Container Runtime hook.
DisabledHooks []string `toml:"disabled,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we said we're going to remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


// defaultDisabledHooks defines hooks that are disabled by default.
// These hooks can be explicitly enabled using the WithEnabledHooks option.
var defaultDisabledHooks = map[HookName]bool{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need not be a map. Also does it make sense to just set this inline when constructing the hook creator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned into a list.
I do think it is better to have it here, so the list is easier to maintain. Over time, we might want to add other hooks to the list.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR disables the chmod createContainer hook by default, addressing issue #1218. The chmod hook was originally added as a workaround for a specific crun issue with device nodes in subdirectories of /dev, but this has since been fixed in crun version 1.7.

  • Adds defaultDisabledHooks slice containing the ChmodHook to disable it by default
  • Introduces WithEnabledHooks option to explicitly enable hooks that are disabled by default
  • Updates the hook creation logic to handle both disabled and enabled hooks with proper precedence

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/discover/hooks.go Adds default disabled hooks list and enabled hooks functionality
pkg/nvcdi/options.go Adds WithEnabledHook option function
pkg/nvcdi/lib.go Adds enabledHooks field and passes it to hook creator
cmd/nvidia-ctk/cdi/generate/generate.go Adds --enable-hook CLI flag and option handling
cmd/nvidia-ctk/cdi/generate/generate_test.go Adds test case for enabling chmod hook
internal/runtime/runtime_factory.go Refactors hook creator initialization to use options slice
cmd/nvidia-ctk-installer/toolkit/toolkit_test.go Updates expected test output


// Correct the disabledHooks map to ensure that explicitly enabled hooks
// are not disabled.
for hook := range enabledHooks {
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop iterates over enabledHooks variable instead of cdiHookCreator.enabledHooks. This will cause enabled hooks to not be properly processed since the local enabledHooks variable is always empty at this point.

Suggested change
for hook := range enabledHooks {
for hook := range cdiHookCreator.enabledHooks {

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we don't have a test for the case where enabled overrides disabled?

}

hookCreator := discover.NewHookCreator(discover.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path))
opts := []discover.Option{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert this change.

type cdiHookCreator struct {
nvidiaCDIHookPath string
disabledHooks map[HookName]bool
enabledHooks map[HookName]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only actually used in the construction of the cdiHookCreator and doesn't affect behaviour, does it make sense to split out separate options instead of updating the struct directly? Something like:

diff --git a/internal/discover/hooks.go b/internal/discover/hooks.go
index acfaef86..ada69fb8 100644
--- a/internal/discover/hooks.go
+++ b/internal/discover/hooks.go
@@ -85,12 +85,17 @@ func (h *Hook) Hooks() ([]Hook, error) {
 	return []Hook{*h}, nil
 }
 
-type Option func(*cdiHookCreator)
+type hookCreatorOptions struct {
+	nvidiaCDIHookPath string
+	disabledHooks     []HookName
+	enabledHooks      []HookName
+}
+
+type Option func(*hookCreatorOptions)
 
 type cdiHookCreator struct {
 	nvidiaCDIHookPath string
 	disabledHooks     map[HookName]bool
-	enabledHooks      map[HookName]bool
 
 	fixedArgs    []string
 	debugLogging bool
@@ -111,60 +116,61 @@ type HookCreator interface {
 
 // WithDisabledHooks explicitly disables the specified hooks.
 // This can be specified multiple times.
-func WithDisabledHooks(hooks ...HookName) Option {
-	return func(c *cdiHookCreator) {
-		for _, hook := range hooks {
-			c.disabledHooks[hook] = true
+func WithDisabledHooks[T string | HookName](hooks ...T) Option {
+	return func(c *hookCreatorOptions) {
+		for _, h := range hooks {
+			c.disabledHooks = append(c.disabledHooks, HookName(h))
 		}
 	}
 }
 
 // WithEnabledHooks explicitly enables the specified hooks.
 // This is useful for enabling hooks that are disabled by default.
-func WithEnabledHooks(hooks ...HookName) Option {
-	return func(c *cdiHookCreator) {
-		for _, hook := range hooks {
-			c.enabledHooks[hook] = true
+func WithEnabledHooks[T string | HookName](hooks ...T) Option {
+	return func(c *hookCreatorOptions) {
+		for _, h := range hooks {
+			c.enabledHooks = append(c.disabledHooks, HookName(h))
 		}
 	}
 }
 
 // WithNVIDIACDIHookPath sets the path to the nvidia-cdi-hook binary.
 func WithNVIDIACDIHookPath(nvidiaCDIHookPath string) Option {
-	return func(c *cdiHookCreator) {
+	return func(c *hookCreatorOptions) {
 		c.nvidiaCDIHookPath = nvidiaCDIHookPath
 	}
 }
 
 func NewHookCreator(opts ...Option) HookCreator {
-	disabledHooks := make(map[HookName]bool)
-	enabledHooks := make(map[HookName]bool)
-	for _, hook := range defaultDisabledHooks {
-		disabledHooks[hook] = true
-	}
-
-	cdiHookCreator := &cdiHookCreator{
+	o := &hookCreatorOptions{
 		nvidiaCDIHookPath: defaultNvidiaCDIHookPath,
-		disabledHooks:     disabledHooks,
-		enabledHooks:      enabledHooks,
 	}
 	for _, opt := range opts {
-		opt(cdiHookCreator)
+		opt(o)
 	}
 
-	// Correct the disabledHooks map to ensure that explicitly enabled hooks
-	// are not disabled.
-	for hook := range enabledHooks {
-		cdiHookCreator.disabledHooks[hook] = false
+	o.disabledHooks = append(o.disabledHooks, defaultDisabledHooks...)
+
+	disabledHooks := make(map[HookName]bool)
+	for _, h := range o.disabledHooks {
+		disabledHooks[h] = true
 	}
 
-	if cdiHookCreator.disabledHooks[AllHooks] {
+	if disabledHooks[AllHooks] && len(o.enabledHooks) == 0 {
 		return &allDisabledHookCreator{}
 	}
 
-	cdiHookCreator.fixedArgs = getFixedArgsForCDIHookCLI(cdiHookCreator.nvidiaCDIHookPath)
+	for _, h := range o.enabledHooks {
+		disabledHooks[h] = false
+	}
+
+	c := &cdiHookCreator{
+		nvidiaCDIHookPath: o.nvidiaCDIHookPath,
+		disabledHooks:     make(map[HookName]bool),
+		fixedArgs:         getFixedArgsForCDIHookCLI(o.nvidiaCDIHookPath),
+	}
 
-	return cdiHookCreator
+	return c
 }
 
 // Create creates a new hook with the given name and arguments.
@@ -183,7 +189,11 @@ func (c cdiHookCreator) Create(name HookName, args ...string) *Hook {
 }
 
 func (c cdiHookCreator) isDisabled(name HookName, args ...string) bool {
-	if c.disabledHooks[name] {
+	disabled, ok := c.disabledHooks[name]
+	if ok {
+		return disabled
+	}
+	if c.disabledHooks[AllHooks] {
 		return true
 	}
 

It would also be good to add a unit test to test cdiHookCreator.isDisabled behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test added

Comment on lines 127 to 138
name: "nvidia-ctk in /usr/sbin (RPM-based systems)",
opts: []Option{
WithNVIDIACDIHookPath("/usr/sbin/nvidia-ctk"),
},
expectedType: "cdiHookCreator",
expectedPath: "/usr/sbin/nvidia-ctk",
expectedDisabled: map[HookName]bool{
ChmodHook: true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see the point of this test in principle there is nothing in the constructor that depends on the bin vs sbin on various platforms. From the point of view of the contructor the path is just a string, and only the BASENAME of the executable affects the behaviour.

Comment on lines 173 to 175
for hook, disabled := range tc.expectedDisabled {
require.Equal(t, disabled, creator.disabledHooks[hook], "Hook %s disabled state mismatch", hook)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can one use require.ElementsMatch or something similar here?

Comment on lines 26 to 30
testCases := []struct {
name string
opts []Option
expectedType string // "cdiHookCreator" or "allDisabledHookCreator"
expectedPath string
expectedDisabled map[HookName]bool
}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit with a proposal for testing the constructor. We can define testCases as:

testCases := []struct {
		name     string
		opts     []Option
		expected HookCreator
	}{

and clean up the implementation significantly.

Comment on lines 504 to 487
o := &hookCreatorOptions{
nvidiaCDIHookPath: defaultNvidiaCDIHookPath,
}
opts := []Option{
WithDisabledHooks(tt.disabledHooks...),
WithEnabledHooks(tt.enabledHooks...),
}
for _, opt := range opts {
opt(o)
}
o.disabledHooks = append(o.disabledHooks, defaultDisabledHooks...)

disabledHooks := make(map[HookName]bool)
for _, h := range o.disabledHooks {
disabledHooks[h] = true
}

for _, h := range o.enabledHooks {
disabledHooks[h] = false
}

testCreator := &cdiHookCreator{
nvidiaCDIHookPath: o.nvidiaCDIHookPath,
disabledHooks: disabledHooks,
fixedArgs: getFixedArgsForCDIHookCLI(o.nvidiaCDIHookPath),
}

require.Equal(t, testCreator.isDisabled(tt.hookName, tt.args...), tt.expectedResult)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we reimplementing the contructor here? What exactly are we testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind the way this test is built is that we want to test the CDIHookCreator.isDisabled directly without testing the NewHookCreator func.

I'll simplify this. I realized that we can trim down to focus on creating the cdiHookCreator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note. If we wanted to test isDisabled directly then just have the testCases struct define a cdiHookCreator member directly. We can set that the way we want it WITHOUT having to reimplement the constructor.

This comment was marked as outdated.

},
},
{
name: "multiple hooks disabled and enabled",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This doesn't test multiple enabled hooks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

},
},
{
name: "WithDisabledHooks can be called multiple times",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a similar test for WithEnabledHooks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, as we are calling the NewHookCreator during the create test

{
name: "nvidia-ctk in custom NVIDIA toolkit path",
opts: []Option{
WithNVIDIACDIHookPath("/usr/local/nvidia/toolkit/nvidia-ctk"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's use the same custom path as for the nvidia-cdi-hook tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Comment on lines 151 to 163
{
name: "nvidia-cdi-hook in /usr/local/bin",
opts: []Option{
WithNVIDIACDIHookPath("/usr/local/bin/nvidia-cdi-hook"),
},
expected: &cdiHookCreator{
nvidiaCDIHookPath: "/usr/local/bin/nvidia-cdi-hook",
fixedArgs: []string{"nvidia-cdi-hook"},
disabledHooks: map[HookName]bool{
ChmodHook: true,
},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't test anything that isn't already covered by the custom path test already added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Comment on lines 473 to 487
disabledHooks := make(map[HookName]bool)
for _, h := range tt.disabledHooks {
disabledHooks[h] = true
}
for _, h := range tt.enabledHooks {
disabledHooks[h] = false
}

testCreator := &cdiHookCreator{
nvidiaCDIHookPath: defaultNvidiaCDIHookPath,
disabledHooks: disabledHooks,
fixedArgs: getFixedArgsForCDIHookCLI(defaultNvidiaCDIHookPath),
}

require.Equal(t, testCreator.isDisabled(tt.hookName, tt.args...), tt.expectedResult)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already proposed this change once:

Suggested change
disabledHooks := make(map[HookName]bool)
for _, h := range tt.disabledHooks {
disabledHooks[h] = true
}
for _, h := range tt.enabledHooks {
disabledHooks[h] = false
}
testCreator := &cdiHookCreator{
nvidiaCDIHookPath: defaultNvidiaCDIHookPath,
disabledHooks: disabledHooks,
fixedArgs: getFixedArgsForCDIHookCLI(defaultNvidiaCDIHookPath),
}
require.Equal(t, testCreator.isDisabled(tt.hookName, tt.args...), tt.expectedResult)
testCreator := NewHookCreator(
WithDisabledHooks(tt.disabledHooks...),
WithEnabledHooks(tt.enabledHooks...),
)
require.Equal(t, testCreator.(*cdiHookCreator).isDisabled(tt.hookName, tt.args...), tt.expectedResult)
})

If you left it this way to test the "AllDisabled" cases, then let's rather add those to the Create tests above:

+               {
+                       name:         "allDisabled with unknown hook",
+                       hookCreator:  &allDisabledHookCreator{},
+                       hookName:     HookName("unknown-hook"),
+                       args:         []string{},
+                       expectedHook: nil,
+               },
+               {
+                       name:         "allDisabled with defined",
+                       hookCreator:  &allDisabledHookCreator{},
+                       hookName:     UpdateLDCacheHook,
+                       args:         []string{},
+                       expectedHook: nil,
+               },

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acknowledge you proposed the change, but I ended up deciding for the current implementation, as the testCreator during the AllDisabled returns a allDisabledHookCreator which doesn't have an isDisabled method.

This is the main reason I decided on the current implementation.

With that said, let me add a change that might work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I adopted your suggestion and added the AllHooks test to the Create section

This comment was marked as outdated.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR disables the chmod container hook by default, addressing a workaround that is no longer needed since crun v1.7 fixed the underlying issue. The chmod hook was originally added to handle device node permissions in subdirectories of /dev but is now redundant.

Key changes:

  • Modified hook creator to disable chmod hook by default
  • Added explicit hook enabling/disabling functionality
  • Updated CDI generation to support explicit hook enabling

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/nvcdi/options.go Adds WithEnabledHooks option and deprecates singular WithDisabledHook
pkg/nvcdi/lib.go Adds enabledHooks field and passes it to hook creator
internal/runtime/runtime_factory.go Minor formatting change (blank line addition)
internal/discover/hooks_test.go Comprehensive test suite for hook creation and disabling functionality
internal/discover/hooks.go Core implementation to disable chmod hook by default and support enabled hooks
cmd/nvidia-ctk/cdi/generate/generate_test.go Test case validating chmod hook can be explicitly enabled
cmd/nvidia-ctk/cdi/generate/generate.go CLI flag for enabling hooks and validation logic
cmd/nvidia-ctk-installer/toolkit/toolkit_test.go Updates expected output to include additional device node

@ArangoGutierrez
Copy link
Collaborator Author

@elezar commits squashed

This change disables the chmod hook by default. This hook was
added as a workaround for a specific crun bug and should no
longer be necessary in the most cases.

In addition the ability to explicitly enable disabled hooks
was added to both the nvcdi API and nvidia-ctk cdi generate
command so that users can opt-in to the chmod hook using cdi
if required.

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar merged commit c350d13 into NVIDIA:main Aug 20, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable chmod hook by default

3 participants