Prebuild CLI E2E Docker image#16787
Conversation
Build the default CLI E2E Docker image once per workflow and load it in split test jobs through an explicit image override. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16787Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16787" |
Wire daily smoke and flaky reproduction workflows into the prebuilt image path, and scope the helper fail-fast behavior to workflows that explicitly require the preloaded image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces redundant Docker builds across the CLI E2E GitHub Actions matrix by prebuilding the default CLI E2E Docker image once, publishing it as an artifact, and loading/tagging it in downstream Linux test jobs. It also updates the CLI E2E helper to support an explicit prebuilt image override via ASPIRE_E2E_DOTNET_IMAGE, while preserving Dockerfile-based builds for local runs.
Changes:
- Add a reusable workflow to build/save/upload the default CLI E2E Docker image as an artifact, and wire it into the main and specialized test pipelines.
- Update the reusable
run-tests.ymlworkflow to download/load/tag the prebuilt image on Linux and exportASPIRE_E2E_DOTNET_IMAGEfor tests. - Refactor CLI E2E test helpers to select between “prebuilt image” vs “build from Dockerfile”, and add targeted unit tests for that selection logic.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/Aspire.Cli.EndToEnd.Tests/Helpers/CliInstallStrategyTests.cs | Adds tests validating Docker source selection behavior (prebuilt image vs Dockerfile fallback vs CI enforcement). |
| tests/Aspire.Cli.EndToEnd.Tests/Helpers/CliE2ETestHelpers.cs | Introduces ASPIRE_E2E_DOTNET_IMAGE support and CI behavior for choosing prebuilt image vs Dockerfile build. |
| .github/workflows/tests.yml | Adds a prebuild-image job and updates job dependencies/gating to include it where CLI E2E runs are present. |
| .github/workflows/tests-quarantine.yml | Updates the PR path filter list to include the new reusable image build workflow. |
| .github/workflows/tests-outerloop.yml | Updates the PR path filter list to include the new reusable image build workflow. |
| .github/workflows/specialized-test-runner.yml | Wires the prebuild-image workflow into specialized runs and gates execution appropriately. |
| .github/workflows/run-tests.yml | Downloads/loads/tags the prebuilt image on Linux CLI-archive runs and sets ASPIRE_E2E_DOTNET_IMAGE. |
| .github/workflows/build-cli-e2e-image.yml | New reusable workflow that builds and uploads the default CLI E2E Docker image artifact. |
Use a stable prebuilt image tag for CLI E2E artifact consumers and clear Docker build args when Hex1b runs from a prebuilt image. Add BuildKit remote cache with Ubuntu mirror fallback, remove Java from the default .NET image, and keep repository-dependent script copies after cache-stable image layers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only clear Hex1b build args when the CLI E2E helper is using a prebuilt image without a Dockerfile path. Keep SKIP_SOURCE_BUILD and apt mirror build args for polyglot and other Dockerfile variants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generalize the CLI E2E prebuilt image workflow so it produces the shared .NET/Python, polyglot, and Java polyglot images once per workflow run. CLI E2E jobs now load the matching artifacts and the helper can consume variant-specific image overrides with explicit requirements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clear variant-specific prebuilt image environment variables in tests that intentionally verify fallback to Dockerfile builds. This matches the CI environment where the shared prebuilt image variables are set globally before running CliInstallStrategyTests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JamesNK
left a comment
There was a problem hiding this comment.
LGTM — the workflow orchestration, C# test helper refactoring, and Dockerfile cache optimization all look solid. Two minor comments: one potential fragile coupling in run-tests.yml (REQUIRE flag set outside the file-existence guard) and a cosmetic indent inconsistency in tests.yml.
radical
left a comment
There was a problem hiding this comment.
Looks good. Just a few follow-up suggestions.
radical
left a comment
There was a problem hiding this comment.
Two inline comments on the test helper / test scoping. Two non-blocking points for the description / validation:
1. The new GHA BuildKit cache is silently inert. build-cli-e2e-image.yml invokes docker buildx build --cache-from type=gha --cache-to type=gha,…,ignore-error=true without docker/setup-buildx-action (or crazy-max/ghaction-github-runtime), so neither ACTIONS_RUNTIME_TOKEN nor ACTIONS_RESULTS_URL is exposed to the build step. The gha cache backend can't authenticate and ignore-error=true swallows the cache-write failure. As a result there is no cross-run cache today — the only caching that happens is BuildKit's local layer cache for that single docker buildx build invocation, which doesn't survive between workflow runs (each run gets a fresh hosted runner / fresh cli-e2e-builder driver). Within a single workflow run the build job executes once anyway and ships its output via actions/upload-artifact, so there's no intra-run cache benefit either.
This is a soft issue (the prebuild still works, just without the claimed cache benefit), so two valid resolutions:
- Fix it: add a pinned
docker/setup-buildx-action@…step before the build (and drop the manualdocker buildx create --use); the action both creates the builder and exports the runtime token / cache URL needed fortype=gha. - Or update the PR description to reflect that the optimization is per-workflow consolidation (one build job per matrix instead of N rebuilds) rather than a cross-run BuildKit cache.
2. Validation on the full quarantine + outerloop pipelines. Both workflows already auto-trigger on this PR via their paths: filters, and the most recent runs against sebros/prebuild-cli-e2e-image (outerloop run 25405631870, quarantine run 25405631888) completed successfully. I've also manually re-triggered both against the latest head:
- Outerloop: https://github.com/microsoft/aspire/actions/runs/25420938375
- Quarantine: https://github.com/microsoft/aspire/actions/runs/25420939422
Worth waiting on these (and any subsequent re-runs after addressing the comments below) before merge, given the change touches the prebuild plumbing for both pipelines.
radical
left a comment
There was a problem hiding this comment.
Adding the GHA cache point as an inline comment for visibility on the file itself.
Extract prebuilt CLI E2E image loading into a shared script, document the image contract, and make Java image requirements fail fast during artifact loading. Tighten helper tests and prebuilt-image strategy validation, align workflow indentation, and initialize Buildx through setup-buildx-action so the GHA cache backend can authenticate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Export the GitHub Actions cache runtime with the repo-approved github-script action and create the Buildx builder in the shell step so CLI E2E image cache remains active without triggering workflow startup restrictions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🎬 CLI E2E Test Recordings — 77 recordings uploaded (commit View all recordings
📹 Recordings uploaded automatically from CI run #25441231885 |
|
Two more nits surfaced from a follow-up review pass — both low-priority, neither blocking. Tracking on #16825 alongside the existing follow-up so this PR can land:
|
|
No documentation PR is required for this change. This PR is a CI/build infrastructure improvement that adds a reusable GitHub Actions workflow to prebuild shared CLI E2E Docker images, reducing redundant work across the test matrix. It contains no user-facing changes, new public APIs, new configuration options, or behavioral changes that affect Aspire developers or users.
|
CLI E2E tests are split across many isolated GitHub Actions jobs, so each job cold-builds the same default Dockerfile.e2e image before tests start. That repeats slow and failure-prone apt/docker build work across the matrix.
This change adds a reusable workflow that builds the shared CLI E2E Docker images once, saves them as artifacts, and has Linux CLI E2E jobs load those images before running tests. The test helper supports explicit image overrides for the DotNet/Python, polyglot, and Java polyglot variants and only fails fast when the matching
ASPIRE_E2E_REQUIRE_*_IMAGEvariable is set, so workflows that have not opted into the prebuilt image path can still fall back to the existing Dockerfile build behavior.The prebuild is wired into the regular split CLI E2E matrix, quarantined/outerloop specialized runs, daily CLI smoke tests, and the flaky-test reproduction workflow when
TEST_PROJECTisCli.EndToEnd. The shared artifacts now coverDockerfile.e2e,Dockerfile.e2e-polyglot-base, andDockerfile.e2e-polyglot-java; Podman keeps its existing variant-specific image path because it uses the privileged nested-runtime setup. There are no Rust or Go CLI E2E Dockerfile variants today.The image build now uses BuildKit with a GitHub Actions remote cache for the DotNet/Python and polyglot base images, and retries with the default Ubuntu apt sources if the Azure mirror build fails. The default DotNet image no longer installs Java; Java CLI E2E coverage uses the
PolyglotJavaDockerfile variant, so the heavy JDK layer is isolated to the Java image. Repository-dependent script copies are also ordered after the toolchain and bundle layers to preserve cache hits when source files or install scripts change independently.