Skip to content

Conversation

@jgehrcke
Copy link
Collaborator

@jgehrcke jgehrcke commented Mar 26, 2025

This is a patch for #297 which tries to take precise control of looking for the match in the expected section of the /proc/devices file contents.

I tested this function in isolation on a 'challenging' node, and it returns data as expected:

$ sudo docker run -v $(pwd)/yo.go:/yo.go golang go run /yo.go
234 <nil>

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jgehrcke jgehrcke force-pushed the jp/major-parser branch 2 times, most recently from 6e663f3 to 129764f Compare March 26, 2025 12:25
@jgehrcke
Copy link
Collaborator Author

jgehrcke commented Mar 26, 2025

We get this code quality warning:

Incorrect conversion of an integer with architecture-dependent bit size from to a lower bit size type uint32 without an upper bound check.

Fair enough. We have type int everywhere (which often is 64-bit), and then in dev := unix.Mkdev(uint32(major), uint32(channel))) we "ignorantly" convert to uint32. That's what the complaint is about, and it makes a bit of sense. I will try to mess with the types to resolve this. The upper bound check for a 32-bit signed integer can be performed with strconv.ParseInt(value, 10, 32)`. The upper bound for a 32-bit UNsigned integer is higher.

(see https://github.com/NVIDIA/k8s-dra-driver-gpu/security/code-scanning/2)

@jgehrcke
Copy link
Collaborator Author

jgehrcke commented Mar 26, 2025

I have btw visualized and tested the regular expression here: https://regex101.com/r/WaHkdK/1

I've also given it a bit of a challenging probe, and with the green highlight we see that it extracts the data in question despite a number of distractions:

image{width=200}

@ArangoGutierrez ArangoGutierrez requested a review from Copilot March 26, 2025 13:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the robustness of parsing the /proc/devices file by replacing line‐by‐line scanning with a regular expression-based search for the device major number in the "Character devices" section.

  • Switch from scanning file lines to reading the full file into memory.
  • Use a regular expression to extract the desired major number.

@jgehrcke
Copy link
Collaborator Author

Tested on Luna, which has limited signal: IMEX daemon not meant to start up, and code path is not visted:

	// Only prepare files to inject to the daemon if IMEX is supported.
	if s.computeDomainManager.cliqueID != "" {
		// Parse the device node info for the fabic-imex-mgmt nvcap.
		nvcapDeviceInfo, err := s.nvdevlib.parseNVCapDeviceInfo(nvidiaCapFabricImexMgmtPath)
		if err != nil {
			return nil, fmt.Errorf("error parsing nvcap device info for fabic-imex-mgmt: %w", err)
		}

		// Create the device node for the fabic-imex-mgmt nvcap.
		if err := s.nvdevlib.createNvCapDevice(nvidiaCapFabricImexMgmtPath); err != nil {
			return nil, fmt.Errorf("error creating nvcap device for fabic-imex-mgmt: %w", err)
		}

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ArangoGutierrez ArangoGutierrez requested a review from Copilot March 26, 2025 13:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR strengthens the parsing of the /proc/devices file by switching from a line-by-line scan to a regex-based search, thereby improving the detection of the device major number in the "Character devices" section.

  • Replaces iterative scanning with a regex that targets the specific block between "Character devices:" and "Block devices:"
  • Refines error messages and adds an upper bound validation on the device major number

Comment on lines 229 to 230
// Search for an "<integer> <name>" occurrence in the "Character devices"
// section of the /proc/devices file. Return the integer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funcs
A function’s doc comment should explain what the function returns or, for functions called for side effects, what it does. Named parameters and results can be referred to directly in the comment, without any special syntax like backquotes. (A consequence of this convention is that names like a, which might be mistaken for ordinary words, are typically avoided.) For example:

package strconv

// Quote returns a double-quoted Go string literal representing s.
// The returned string uses Go escape sequences (\t, \n, \xFF, \u0100)
// for control characters and non-printable characters as defined by IsPrint.
func Quote(s string) string {
...
}
And:

package os

// Exit causes the current program to exit with the given status code.
// Conventionally, code zero indicates success, non-zero an error.
// The program terminates immediately; deferred functions are not run.
//
// For portability, the status code should be in the range [0, 125].
func Exit(code int) {
...
}

see more at https://tip.golang.org/doc/comment

Copy link
Collaborator Author

@jgehrcke jgehrcke Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, what is the specific suggestion here?

A function’s doc comment should explain what the function returns or, for functions called for side effects, what it does.

The comment does that, I think.

What is the change you would like to see?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment must start with the func name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment must start with the func name

Okay. I changed this, squashed, force-pushed.

Note that in terms of communication and 'truth-seeking', I have remarks:

That aspect was not mentioned in the text you quoted above :) It's also not explicitly said in the doc you linked. I get it, the examples support that idea.

https://stackoverflow.com/questions/35591301/function-name-in-comment is about that topic. There, they quote

The first sentence should be a one-sentence summary that starts with the name being declared.

which at some point was part of https://go.dev/doc/effective_go#commentary.

It is not there anymore, today. Probably the removal was a deliberate choice.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the robustness of parsing /proc/devices when searching for a device’s major number, addressing the issue outlined in #297.

  • Replaces a line-by-line scanner with reading the entire file and applying a regex.
  • Uses a regular expression to more precisely extract the device major number between "Character devices:" and "Block devices:".
Comments suppressed due to low confidence (1)

cmd/compute-domain-kubelet-plugin/nvlib.go:240

  • The regex pattern assumes the presence of a 'Block devices:' section to delimit the match. This could lead to parsing failures on systems where the /proc/devices file does not include that section; consider making this part optional or adding a fallback error message that guides further debugging.
"([0-9]+) " + regexp.QuoteMeta(name) + ".*Block devices:"

@jgehrcke
Copy link
Collaborator Author

Thanks, @ArangoGutierrez.

Let's get this in and get going with testing.

@jgehrcke jgehrcke merged commit 8f472c0 into NVIDIA:main Mar 26, 2025
7 checks passed
@klueska klueska added this to the v25.3.0 milestone Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

3 participants