Skip to content

Verify GH action tag/SHA combinations#356

Open
snazy wants to merge 7 commits into
apache:mainfrom
snazy:gh-action-sha-tag-check
Open

Verify GH action tag/SHA combinations#356
snazy wants to merge 7 commits into
apache:mainfrom
snazy:gh-action-sha-tag-check

Conversation

@snazy
Copy link
Copy Markdown
Member

@snazy snazy commented Nov 7, 2025

This change introduces a new function verify_actions to validate the contents against GitHub.

TL;DR
The function verifies that the SHAs specified in actions.yml exist in the GH repo. Also ensures that the SHA exists on the Git tag, if the tag attribute is specified. The rest of the function is a lot of output and error(failure) and warning collection.

Although it issues quite a few GH API requests, the rate limiter should not kick in (with an authenticated GH token, GH workflows have a limit of 15k requests). I opted to rely on the HTTP/1.1 urllib.request stuff, which has no connection-reuse. The alternative would have been to add a dependency.

The algorithm roughly works like this, for each action specified in actions.yml:

  • Issue a warning and stop, if the name is like OWNER/* ("wildcard" repository). Can't verify Git SHAs in this case.
  • Issue a warning and stop, if the name is like docker:* (not implemented)
  • Issue an error and stop, if the name doesn't start with an OWNER/REPO pattern.
  • Each expired entry is just skipped
  • If there is a wildcard reference and a SHA reference, issue an error.

Then, for each reference for an action:

  • If no tag is specified, let GH resolve the commit SHA. Emit a warning to add the value of the tag attribute, if the SHA can be resolved. Otherwise, emit an error.
  • If tag is specified:
    • Add the SHA to the set of requested-shas-by-tag
    • Call GH's "matching-refs" endpoint for the 'tag' value
      • Emit en error, if the object type is not a tag or commit.
      • Also resolve 'tag' object types to 'commit' object types.
      • Add each returned SHA to the set of valid-shas-by-tag.
  • For each "requested tag" verify that the sets of valid and requested shas intersect. If not, emit an error.

Fixes #110

@snazy
Copy link
Copy Markdown
Member Author

snazy commented Nov 7, 2025

This is in a very early stage and meant to just gather feedback and opinions about the approach.

@snazy
Copy link
Copy Markdown
Member Author

snazy commented Nov 17, 2025

@raboof do you think it's worth tackling this one?

@raboof
Copy link
Copy Markdown
Member

raboof commented Nov 18, 2025

This looks helpful to me. Does it actually share code with gateway.py or could it be a separate file?

@snazy
Copy link
Copy Markdown
Member Author

snazy commented Nov 18, 2025

This looks helpful to me. Does it actually share code with gateway.py or could it be a separate file?

It does use the load_actions and a data class. But nothing that presents moving the code to a separate .py file.

@snazy snazy force-pushed the gh-action-sha-tag-check branch 19 times, most recently from af91fe6 to 185416b Compare December 21, 2025 14:14
@snazy
Copy link
Copy Markdown
Member Author

snazy commented Dec 21, 2025

Made some progress on this one.

Moved the code to a separate source file and added it to the update_actions workflow.
The output (github summary) would yield five failures.
One is tackled via #426 (my bad, included in this PR as well for now).

The four remaining ones are because the the ScaCap organization has an IP allow list enabled, which prevents GH hosted runners to perform GH API requests against their org, which prevents the verification code to verify the SHAs and tags. The checks work fine for the ScaCap org from my machine.
I have added a new boolean flag ignore_gh_api_errors for action-references to let the verification code ignore GH API failures. Setting this flag to true means that GH API errors are ignored, but the checks still happen and verification errors are still emitted, just not as failures but as warnings.
Updated the actions.yml with that flag for the scacap/action-surefire-report action.

The warnings are:

  • Two references to Docker images (not verified)
  • Two wildcard repository references
    • golangci/*
    • rustsec/*
  • Two SHAs without a tag name
    • browser-actions/setup-geckodriver
    • damccorm/tag-ur-it
  • Two wildcard SHA and specific SHA references for the same action
    • sbt/setup-sbt
    • gradle/wrapper-validation-action

@snazy snazy force-pushed the gh-action-sha-tag-check branch 6 times, most recently from 6a82461 to a1796d0 Compare December 21, 2025 15:57
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 18, 2026

I think it's a helpful check

@snazy snazy force-pushed the gh-action-sha-tag-check branch 3 times, most recently from 80da406 to 3ff4fee Compare April 18, 2026 10:05
@snazy
Copy link
Copy Markdown
Member Author

snazy commented Apr 18, 2026

I've rebased this PR.

Two actions are a bit problematic:

  • dtolnay/rust-toolchain references the stable branch, not a particular tag - the Git commit exists though.
  • matlab-actions/run-tests: the tag v3.1.0 was added on Apr 12, 2026 via PR gateway: bump matlab-actions/run-tests from 3.0.0 to 3.1 #695, but moved on Apr 14, 2026 to 64426f612825c99055c2b4887c08c135ab01be0b/a6c82b97e58e6b95bf49fcc8f5ea82370dcccd12. The diff to the original SHA 353aee49b0edf62278c118a51b484d90bf6da1b7 is here. However, v3.1.0 is mentioned as a pre-release.
image

@snazy snazy marked this pull request as ready for review April 18, 2026 10:13
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 28, 2026

dtolnay/rust-toolchain references the stable branch, not a particular tag - the Git commit exists though.

I guess we should make it into 1.94.1 (different sha aad518f59d88bae90133242f9ddac7f8bbc5dddf) - can you update it @snazy ?

This has been added here #620 -@kevinjqliu - are you ok with that? I guess you are the user of that action - so you will have to update it in your repo?

Maybe also worth checking if others do not use it ?

matlab-actions/run-tests: the tag v3.1.0 was added on Apr 12, 2026

This has been removed in main - they apparently changed "release" to "pre-release" and kept on moving it.

@kevinjqliu
Copy link
Copy Markdown
Contributor

This has been added here #620 -@kevinjqliu - are you ok with that? I guess you are the user of that action - so you will have to update it in your repo?

looks like a bunch of repos are already using that commit hash https://grep.app/search?f.repo.pattern=apache%2F&q=dtolnay%2Frust-toolchain%4029eef336d9b2848a0b548edc03f92a220660cdb8

and some still using @stable https://grep.app/search?f.repo.pattern=apache%2F&q=dtolnay%2Frust-toolchain%40stable

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jun 1, 2026

@snazy gentle nudge on this — still think the tag/SHA verification check is worth having. check_action_tags is currently red; from the thread the open edge cases were dtolnay/rust-toolchain (tracks the stable branch, no tag) and the matlab-actions/run-tests pre-release tag move. Are you planning to push this over the line, or would it help to split out the parts that are ready (the ignore_gh_api_errors flag + core verification) so they can land incrementally?

@snazy snazy force-pushed the gh-action-sha-tag-check branch from 7890f4e to 06ed179 Compare June 2, 2026 06:34
snazy added 3 commits June 2, 2026 08:35
This change introduces a new function `verify_actions` to validate the contents against GitHub.

TL;DR
The function verifies that the SHAs specified in `actions.yml` exist in the GH repo.
Also ensures that the SHA exists on the Git tag, if the `tag` attribute is specified.
The rest of the (currently spaghetti code) function is a lot of output and error(failure) and warning collection.

Although it issues quite a few GH API requests, the rate limiter should not kick in (with an authenticated GH token).
I opted to rely on the HTTP/1.1 `urllib.request` stuff, which has no connection-reuse. The alternative would have been to add a dependency.

The algorithm roughly works like this, for each action specified in `actions.yml`:
* Issue a warning and stop, if the name is like `OWNER/*` ("wildcard" repository).
  Can't verify Git SHAs in this case.
* Issue a warning and stop, if the name is like `docker:*` (not implemented)
* Issue an error and stop, if the name doesn't start with an `OWNER/REPO` pattern.
* Each expired entry is just skipped
* If there is a wildcard reference and a SHA reference, issue an error.

Then, for each reference for an action:
* If no `tag` is specified, let GH resolve the commit SHA.
  Emit a warning to add the value of the `tag` attribute, if the SHA can be resolved.
  Otherwise, emit an error.
* If `tag` is specified:
  * Add the SHA to the set of requested-shas-by-tag
  * Call GH's "matching-refs" endpoint for the 'tag' value
    * Emit en error, if the object type is not a tag or commit.
    * Also resolve 'tag' object types to 'commit' object types.
    * Add each returned SHA to the set of valid-shas-by-tag.
* For each "requested tag" verify that the sets of valid and requested shas intersect. If not, emit an error.
1. `matlab-actions/run-tests`: the tag `v3.1.0` has been moved on Apr 14, 2026 (initially added on Apr 12 via apache#695)
2. `dtolnay/rust-toolchain`: `stable` is a branch and cannot be validated as a tag
@snazy snazy force-pushed the gh-action-sha-tag-check branch from 06ed179 to 3b4c8ac Compare June 2, 2026 06:35
@snazy snazy force-pushed the gh-action-sha-tag-check branch 3 times, most recently from ddb5645 to e39a9ba Compare June 2, 2026 06:59
@snazy snazy force-pushed the gh-action-sha-tag-check branch from e39a9ba to e8ef2e7 Compare June 2, 2026 08:08
@snazy snazy force-pushed the gh-action-sha-tag-check branch from e8ef2e7 to 832e95b Compare June 2, 2026 08:27
@snazy
Copy link
Copy Markdown
Member Author

snazy commented Jun 2, 2026

I've rebased the branch.

The changes to actions.yml now only contain the changes for the scacap/action-surefire-report action to ignore GH API errors, because that org has an "IP allowlist" that prevents us from using the GH API against that org.

I've made the dtolnay/rust-toolchain action case that references their master branch a warning, instead of a hard failure.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jun 2, 2026

Thanks for the rebase and scoping this down, @snazy — the approach is solid and the test coverage on the happy paths is nice. I think the check is worth having. Two things I'd want sorted before approving, plus a few smaller items:

🔴 Blockers

  1. Leftover debug raise in gateway/action_tags.py (the invalid-Git-SHA branch):

    else:
        result.failure(f"... references an invalid Git SHA '{ref}'", "  ..")
        raise Exception("foo")

    On any invalid SHA this records the failure and then crashes the run with Exception("foo") instead of failing gracefully. The branch isn't covered by a test, so green CI doesn't catch it. The result.failure(...) above already does the right thing — the raise should go (a small regression test for this branch would be great too).

  2. The workflow won't trigger on the PRs it's meant to guard (check_action_tags.yml). The path filters reference files that don't exist:

    push:         paths: [".github/workflows/dummy.yml"]
    pull_request: paths: [".github/workflows/update_actions.yml", ".github/workflows/dummy.yml", "gateway/*"]

    There's no dummy.yml or update_actions.yml — the sync workflow is update.yml, and the inputs that should be verified are actions.yml and .github/actions/for-dependabot-triggered-reviews/action.yml. As written, a dependabot bump or an actions.yml edit won't run this check; only gateway/* changes and manual dispatch do. Adding actions.yml + the composite to the triggers would make it fire when it matters.

🟡 Non-blocking, worth a look

  • os.environ['GH_TOKEN'] (lines 88, 157) raises KeyError on local runs without the token — .get() would be friendlier.
  • today: date = date.today() default arg is evaluated once at import, not per call — today: date | None = None then default inside is the usual fix.
  • run_action_tags.py calls update_actions/update_patterns, so a "verify" run also rewrites actions.yml/approved_patterns.yml — slightly surprising side effect; might be worth a comment explaining it's intentional. (Also a small typo: "GH_TOKEN environment variable should be must.")
  • The GHA step-summary writer emits a stray ``` fence when there are failures but no warnings.

Nothing here is structural — happy to approve once the two blockers are addressed.

@snazy
Copy link
Copy Markdown
Member Author

snazy commented Jun 2, 2026

Thanks for the review! Pushed another commit to address the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verify SHA belongs to released version

5 participants