Skip to content

Conversation

@ArangoGutierrez
Copy link
Collaborator

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive E2E testing for containerd runtime configuration alongside existing Docker testing infrastructure. The changes introduce a nested container testing framework that allows running tests inside containers to validate NVIDIA Container Toolkit behavior in containerized environments.

  • Adds new E2E tests for containerd drop-in configuration functionality
  • Introduces nvidia-cdi-refresh systemd unit testing
  • Implements nested container runner infrastructure for isolated testing

Reviewed Changes

Copilot reviewed 9 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/go.mod Adds new dependencies for UUID generation and test utilities
tests/e2e/runner.go Implements nested container runner with Docker installation and CTK setup
tests/e2e/nvidia-ctk_containerd_test.go New comprehensive containerd E2E test suite
tests/e2e/nvidia-ctk_docker_test.go Refactors to use shared runner infrastructure and fixes macOS compatibility
tests/e2e/nvidia-cdi-refresh_test.go New systemd unit tests for CDI refresh functionality
tests/e2e/nvidia-container-cli_test.go Refactors to use nested container runner
tests/e2e/installer.go Adds containerd installation template and additional flags support
tests/e2e/e2e_test.go Centralizes test runner initialization in BeforeSuite
tests/e2e/Makefile Documents new test categories

@ArangoGutierrez
Copy link
Collaborator Author

Builds on #1235

Doesn't include #1311 tests for that should be added as a follow up

@coveralls
Copy link

coveralls commented Sep 23, 2025

Pull Request Test Coverage Report for Build 18005738357

Details

  • 0 of 1 (0.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.006%) to 36.277%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/config/engine/containerd/config_drop_in.go 0 1 0.0%
Totals Coverage Status
Change from base Build 17981864462: 0.006%
Covered Lines: 4827
Relevant Lines: 13306

💛 - Coveralls

@ArangoGutierrez
Copy link
Collaborator Author

I'll mark this PR as ready for review once #1235 is merged

@ArangoGutierrez
Copy link
Collaborator Author

I'll mark this PR as ready for review once #1235 is merged

Rebased

@ArangoGutierrez ArangoGutierrez force-pushed the e2e_containerd branch 3 times, most recently from 029af03 to 1899001 Compare September 25, 2025 11:16
@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review September 25, 2025 11:26
@elezar elezar marked this pull request as draft October 13, 2025 12:10
@ArangoGutierrez ArangoGutierrez force-pushed the e2e_containerd branch 2 times, most recently from 51ad031 to c65e468 Compare October 13, 2025 13:57
@ArangoGutierrez
Copy link
Collaborator Author

Rebased

@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review October 13, 2025 14:01
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some in-code fixes / comments / suggestions as ArangoGutierrez#2

I think it's more useful to start simple, let's not add noise to the test cases with cases that are actually already covered. The intent of these tests are to check that:

The rendered config has the expected contents

In addition we cover the following (strictly speaking already covered by unit tests):

  1. We create the drop in files in the correct location
  2. We update the imports in the top-level

As discussed offline, what we need to check that the rendered config is correct is that:

  1. We make ONLY the expected additions (i.e. add the nvidia runtime)
  2. We make ONLY the expected modifications (i.e. default_runtime_name and cdi_enabled)
  3. We don't remove other config options

I don't think the tests as implemented properly cover this. Here I'm not talking about the configurations that we're running, but rather what we're checking after we have applied the config. This is why I was mentioning diffs in our call. If we diff the config dump before and the config dump after, we expect ONLY the addtions and modifications mentioned above.

Note that some of this is already covered by unit tests, so we don't have to be exhaustive. These unit tests don't, however cover the finer points of containerd merging configs and this is what these tests should focus on. As an example, if we were to update the containerd implementation to not include 598c632 we would expect the tests to fail on platforms where imports are not properly supported.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ArangoGutierrez ArangoGutierrez force-pushed the e2e_containerd branch 4 times, most recently from 8ba2ce0 to 9c952ee Compare November 11, 2025 11:09
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of ArangoGutierrez#3?

@ArangoGutierrez
Copy link
Collaborator Author

What do you think of ArangoGutierrez#3?

🥇

@elezar
Copy link
Member

elezar commented Nov 12, 2025

@ArangoGutierrez let's squash and merge this.

elezar and others added 8 commits November 12, 2025 10:54
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `889a3bb` to `0964f81`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](NVIDIA/libnvidia-container@889a3bb...0964f81)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: '0964f81717e96ac903e39700908677dcdf72ed5f'
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.29.0 to 0.30.0.
- [Commits](golang/mod@v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.30.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [github.com/urfave/cli/v3](https://github.com/urfave/cli) from 3.5.0 to 3.6.0.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](urfave/cli@v3.5.0...v3.6.0)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v3
  dependency-version: 3.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
@ArangoGutierrez ArangoGutierrez merged commit 36107ed into NVIDIA:main Nov 12, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants