Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Aug 1, 2025

This change adds functions to the root.Driver type for returning (or determining) the driver version as well as the libcuda.so.VERSION path. This allows us to cleanup the logic around extracting the version which relies on:

  1. libnvsandboxutils
  2. libnvidia-ml
  3. The suffix of the libcuda.so.VERSION file

This should also simplify the construction of the discoverer for explicit libraries as in #1216

@ArangoGutierrez ArangoGutierrez self-requested a review August 1, 2025 15:40
@ArangoGutierrez ArangoGutierrez requested a review from Copilot August 1, 2025 15:40
@elezar elezar force-pushed the add-version-to-driver-struct branch from 3e50402 to 337987d Compare August 1, 2025 15:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR consolidates driver version determination logic by moving version detection functionality into the root.Driver type. The change eliminates duplicated code across the codebase and centralizes version detection through libnvsandboxutils, libnvidia-ml, and libcuda.so.VERSION file suffix methods.

  • Adds version detection methods to root.Driver with caching
  • Creates a versioner interface with fallback chaining
  • Removes redundant version detection code from management and driver discovery components

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/nvcdi/management.go Removes getCudaVersion method and simplifies driver discoverer creation
pkg/nvcdi/lib.go Refactors initialization to use new versioner pattern with fallback chain
pkg/nvcdi/driver-nvml.go Simplifies driver discovery by delegating version detection to root.Driver
internal/modifier/gated.go Updates to use driver.Version() method
internal/lookup/root/version.go New file implementing versioner interface with fallback support
internal/lookup/root/root.go Adds version caching and libcuda path detection methods
internal/lookup/root/options.go Adds versioner option support
internal/lookup/root/cuda_test.go Updates test to use new API
internal/lookup/cuda/cuda.go Removes obsolete CUDA locator implementation
internal/discover/graphics.go Updates to use driver version methods
internal/discover/compat_libs.go Simplifies to accept version string directly

}

var nvmlOpts []nvml.LibraryOption
candidates, err := l.getDriver().Libraries().Locate("libnvidia-ml.so.1")
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential race condition here. The method calls l.getDriver() which creates a new driver instance, but l.driver might be accessed concurrently during initialization. Consider ensuring thread-safe access to the driver instance.

Copilot uses AI. Check for mistakes.
var nvsandboxutilsOpts []nvsandboxutils.LibraryOption
// Set the library path for libnvidia-sandboxutils
candidates, err := l.driver.Libraries().Locate("libnvidia-sandboxutils.so.1")
candidates, err := l.getDriver().Libraries().Locate("libnvidia-sandboxutils.so.1")
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous issue, l.getDriver() is called which might cause race conditions during concurrent initialization. The driver field should be accessed in a thread-safe manner.

Copilot uses AI. Check for mistakes.
This change adds functions to the root.Driver type for
returning (or determining) the driver version as well as the
libcuda.so.VERSION path. This allows us to cleanup the logic
around extracting the version which relies on:

1. libnvsandboxutils
2. libnvidia-ml
3. The suffix of the libcuda.so.VERSION file

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the add-version-to-driver-struct branch from 337987d to d23ebd1 Compare August 4, 2025 14:35
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elezar elezar merged commit 08b0230 into NVIDIA:main Aug 7, 2025
16 checks passed
@elezar elezar deleted the add-version-to-driver-struct branch August 7, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants