-
Notifications
You must be signed in to change notification settings - Fork 6
feat: add support for NVLink domain discovery in NetQ provider #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds NVLink domain discovery support to the NetQ provider by introducing a new NMX API integration.
Key Changes
- New
nmx.gofile implements compute node fetching from NMX API endpoint using Basic Auth - Added
getComputeNodes()to retrieve domain-to-host mappings via/nmx/v1/compute-nodes - Updated NetQ topology URL from relative path to absolute path with
api/netq/prefix - Improved test assertion in
netq_test.goby adding nil check before error validation
Issues Found
- Critical: Test file references missing
tests/output/netq/computeNodes.json- tests will fail - Critical: Potential nil pointer dereference in error handling when
httpreq.DoRequestfails (existing pattern in codebase)
Architecture
The new functionality follows established patterns from other providers (infiniband, DRA) for domain mapping, using DomainMap.AddHost() to associate domain UUIDs with compute node names.
Confidence Score: 2/5
- This PR has critical issues that will cause test failures and potential runtime panics
- Score of 2 reflects two critical issues: (1) missing test data file will cause immediate test failure, and (2) nil pointer dereference bug in error handling could cause runtime panics. While the nil pointer issue exists in the existing codebase, it's being propagated in new code
- Pay close attention to
pkg/providers/netq/nmx_test.go(missing test data) andpkg/providers/netq/nmx.go(nil pointer issue in error handling)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/providers/netq/nmx.go | 3/5 | New file adding NVLink domain discovery via NMX API; missing test data file and potential nil pointer dereference |
| pkg/providers/netq/nmx_test.go | 2/5 | New test file with missing test data file tests/output/netq/computeNodes.json will cause test failures |
Sequence Diagram
sequenceDiagram
participant Client
participant Provider
participant NetQ as NetQ API
participant NMX as NMX API
Note over Client,NMX: NVLink Domain Discovery Flow
Client->>Provider: Request topology with NVLink domains
rect rgb(240, 240, 255)
Note over Provider,NMX: New: NMX Compute Nodes Discovery
Provider->>Provider: getComputeUrl()
Provider->>Provider: Create Basic Auth header
Provider->>NMX: GET /nmx/v1/compute-nodes
NMX-->>Provider: ComputeNode[] (ID, Name, DomainUUID)
Provider->>Provider: parseComputeNodes()
Provider->>Provider: Build DomainMap
end
rect rgb(255, 240, 240)
Note over Provider,NetQ: Existing: Network Topology Discovery
Provider->>NetQ: POST /auth/v1/login
NetQ-->>Provider: Access Token
Provider->>NetQ: GET /auth/v1/select/opid/{opid}
NetQ-->>Provider: Updated Token
Provider->>NetQ: POST /api/netq/telemetry/v1/object/topologygraph/fetch-topology
NetQ-->>Provider: Topology (nodes, links)
Provider->>Provider: parseNetq()
end
Provider-->>Client: Topology with NVLink domains
4 files reviewed, 2 comments
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (36.58%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #180 +/- ##
==========================================
- Coverage 65.26% 64.99% -0.28%
==========================================
Files 77 78 +1
Lines 4221 4257 +36
==========================================
+ Hits 2755 2767 +12
- Misses 1359 1382 +23
- Partials 107 108 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds NVLink domain discovery support to the NetQ provider by introducing a new NMX API integration. The implementation fetches compute nodes from the /nmx/v1/compute-nodes endpoint and groups them by DomainUUID to build a topology domain map.
Key Changes:
- Added
nmx.gowithgetComputeNodes()function to query NMX API using Basic Authentication - Updated NetQ API endpoint URL to include
api/netq/prefix in the path - Added comprehensive unit tests for URL generation and JSON parsing
- Included realistic test fixture data from NMX API
Critical Issue Found:
pkg/providers/netq/nmx.go:40-42- nil pointer dereference whenhttpreq.DoRequestfails (see existing comment)
Note: The same nil pointer pattern exists in the pre-existing netq.go file at lines 69, 94, and 121, but these are outside the scope of this PR.
Confidence Score: 2/5
- This PR introduces a critical nil pointer dereference bug that will cause runtime panics
- Score of 2 reflects one critical bug that must be fixed: the nil pointer dereference at
nmx.go:42will cause a panic when HTTP requests fail. The rest of the implementation follows existing patterns, has good test coverage, and the API endpoint correction appears valid. - Pay close attention to
pkg/providers/netq/nmx.go- it contains a critical nil pointer dereference bug that needs immediate fixing
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/providers/netq/nmx.go | 2/5 | New file implementing NVLink domain discovery via NMX API - critical nil pointer dereference bug at line 42 |
| pkg/providers/netq/netq.go | 4/5 | Updated TopologyURL constant to include api/netq/ prefix - minor API endpoint correction |
| pkg/providers/netq/nmx_test.go | 5/5 | Comprehensive test coverage for URL generation and compute nodes parsing |
Sequence Diagram
sequenceDiagram
participant Client
participant Provider
participant NMX as NMX API
participant Parser
Client->>Provider: getComputeNodes(ctx)
Provider->>Provider: getComputeUrl()
Provider->>Provider: Build URL & Basic Auth headers
Provider-->>Provider: Return URL, headers
Provider->>NMX: GET /nmx/v1/compute-nodes
Note over NMX: Authenticate via Basic Auth
alt Success
NMX-->>Provider: 200 OK + JSON data
Provider->>Parser: parseComputeNodes(data)
Parser->>Parser: Unmarshal JSON to []ComputeNode
Parser->>Parser: Create DomainMap
loop For each ComputeNode
Parser->>Parser: AddHost(DomainUUID, Name, Name)
end
Parser-->>Provider: DomainMap
Provider-->>Client: DomainMap, nil
else Error
NMX-->>Provider: Error response
Provider-->>Client: nil, httperr.Error
end
5 files reviewed, no comments
0a1481a to
99e62d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds NVLink domain discovery support to the NetQ provider. The implementation fetches compute node information from a new NMX API endpoint (/nmx/v1/compute-nodes) and extracts domain UUIDs to build a topology block structure alongside the existing network tree topology.
Key changes:
- Added
nmx.gowithgetNvlDomains()function that queries the NMX API using Basic authentication - Refactored
provider.goto fetch both NVLink domains and network tree, then combine them into a unified topology structure - Renamed
generateTopologyConfig()togetNetworkTree()and simplified its return value by removing an unnecessary vertex wrapper layer - Updated topology URL path from
telemetry/v1/...toapi/netq/telemetry/v1/... - Added comprehensive test coverage including URL generation and parsing logic
The implementation follows the same patterns used in other providers (DRA, Infiniband) for domain mapping, properly handles errors, and includes appropriate test fixtures.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- The code is well-structured, follows existing patterns in the codebase, includes proper error handling, and has comprehensive test coverage. No critical issues were found.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/providers/netq/nmx.go | 5/5 | Added new file implementing NVLink domain discovery via NMX API endpoint using Basic auth |
| pkg/providers/netq/provider.go | 5/5 | Integrated NVLink domain discovery into topology generation by fetching domains and combining with network tree |
| pkg/providers/netq/netq.go | 5/5 | Renamed function and simplified return value to remove unnecessary wrapper layer |
Sequence Diagram
sequenceDiagram
participant Client
participant Provider
participant NetQ as NetQ API (getNetworkTree)
participant NMX as NMX API (getNvlDomains)
Client->>Provider: GenerateTopologyConfig(ctx, instances)
Provider->>NMX: getNvlDomains(ctx)
NMX->>NMX: getComputeUrl() - Build URL with Basic Auth
NMX->>NMX: DoRequest GET /nmx/v1/compute-nodes
NMX->>NMX: parseComputeNodes(data)
NMX->>NMX: Create DomainMap from compute nodes
NMX-->>Provider: Return DomainMap (domain UUID -> nodes)
Provider->>NetQ: getNetworkTree(ctx, instances)
NetQ->>NetQ: POST /auth/v1/login - Get access token
NetQ->>NetQ: GET /auth/v1/select/opid - Set OpID
NetQ->>NetQ: POST /api/netq/telemetry/v1/.../fetch-topology
NetQ->>NetQ: parseNetq() - Build network tree
NetQ-->>Provider: Return treeRoot (network topology)
Provider->>Provider: Combine into root Vertex
Provider->>Provider: root.Vertices[TopologyTree] = treeRoot
Provider->>Provider: root.Vertices[TopologyBlock] = domains.ToBlocks()
Provider-->>Client: Return complete topology with NVLink domains
6 files reviewed, no comments
Greptile Summary
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Provider
participant NMX as "NMX API"
participant NetQ as "NetQ API"
User->>Provider: "GenerateTopologyConfig()"
Provider->>NMX: "GET /nmx/v1/compute-nodes"
Note over Provider,NMX: "Basic Auth (user:password)"
NMX-->>Provider: "Compute nodes with DomainUUIDs"
Provider->>Provider: "parseComputeNodes()"
Provider->>Provider: "Build DomainMap"
Provider->>NetQ: "POST /auth/v1/login"
NetQ-->>Provider: "Access token"
Provider->>NetQ: "GET /auth/v1/select/opid"
NetQ-->>Provider: "Operation ID confirmed"
Provider->>NetQ: "POST /api/netq/telemetry/v1/object/topologygraph/fetch-topology"
NetQ-->>Provider: "Network topology data"
Provider->>Provider: "parseNetq()"
Provider->>Provider: "Combine tree root + domain blocks"
Provider-->>User: "Complete topology with NVLink domains"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
Signed-off-by: Dmitry Shmulevich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
No description provided.