Azure: Sample app test per shard#49
Merged
Merged
Conversation
Contributor
|
beyond marvelous pipeline |
paolosalvatori
approved these changes
Feb 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Before this change, the CI ran all e.g 15 tests split across 4 static shards (roughly 3-4 tests per shard). This meant if a single test failed, the entire shard had to be rerun — wasting time and making it harder to pinpoint which test actually broke. There was also no way to skip tests that weren't affected by a code change, so every PR ran the full suite regardless of what was modified.
Changes
The workflow now creates one GitHub Actions job per test instead of grouping them into fixed shards. A new lightweight setup job runs first — it calls run-samples.sh --list to get metadata for all e.g 15 tests, then uses a new script (.github/scripts/build-matrix.sh) to decide which tests to include based on which files changed in the PR. The result is a dynamic matrix where each selected test gets its own isolated job. The existing run-samples.sh and Makefile still work exactly as before — the shard math (SPLITS=TOTAL, so each shard runs exactly 1 test) handles the rest.
Scenarios
Infrastructure file changed: If someone modifies a file that affects all tests (like run-samples.sh, Makefile, the workflow YAML, requirements-dev.txt, etc.), the CI plays it safe and runs all e.g 15 tests. This makes sense because these files are shared across every test — a bug introduced here could break anything, so we can't afford to skip tests. For example, if you edit run-samples.sh to add a new sample or fix the test runner logic, all e.g 15 jobs will spin up regardless of the run mode being changed.
One test's files changed : Say you only edited samples/web-app-sql-database/python/scripts/deploy.sh. The CI looks at each test's "watch folders" (the folders it cares about) and checks if any of the changed files fall inside them. In this case, only the web-app-sql-database/python/scripts test watches that folder, so only that one test runs — the other e.g 14 are skipped entirely and don't even create a job. This is the main win: instead of rerunning 3-4 unrelated tests in a static shard, you get exactly the one that matters.
A shared src/ file changed : Each sample has a src/ folder that contains the actual application code (the Python app, templates, etc.). This folder is shared across all deployment methods for that sample — the scripts test, the terraform test, and the bicep test all deploy the same app, just using different infrastructure tools. So if you change samples/web-app-managed-identity/python/src/app.py, the CI correctly triggers all 3 test types for that sample (scripts + terraform + bicep = 3 jobs). The other 12 tests for different samples are still skipped.
Multiple tests across different samples changed : The matching works per-file, so if your PR touches files in 3 different samples (e.g., one scripts deploy script, one terraform config, and one bicep template — all from different sample folders), the CI creates exactly 3 jobs, one for each affected test. It doesn't "expand" to the whole sample — only the specific test type whose watch folder contains the changed file gets selected.
No test-related files changed : If you only changed something like a README or a file outside any sample's watch folders, the CI finds zero matches. In this case no test jobs are created at all — the scripts job shows as grey/skipped in the GitHub Actions UI.
Running Manually
You can trigger the workflow manually via the "Run workflow" button on the Actions tab. It gives you a dropdown to pick 'all' or 'changed'. Use all when you want to run the full test suite regardless of what changed (useful for verifying a clean state or before a release). Use changed to only run tests affected by changes on the branch compared to main — this is also the default for PRs. Note: on this very first PR that introduces these CI changes, changed mode will still run all tests because the diff against main includes the infrastructure files themselves. Once merged, it works as expected.