Use larger runners for nightly jobs #933

mhucka · 2025-12-10T21:58:07Z

The nightly jobs are failing due to running out of disk space. (It's the same problem we had recently with the CI checks workflow.) This PR updates the jobs to use the larger self-hosted runners from Google ML Velocity.

In addition, this PR makes some simplifications to the workflows to get rid of some of the overelaborate things I did before, and make minor other changes to keep with best practices such as putting timeouts on jobs.

Make steps simpler by using tricks such as setting SHELLOPTS, and use new larger runner (from Google ML Velocity) for the build job because the job is running out of disk space on the default runner.

These are safe to use.

Make steps simpler by using tricks such as setting SHELLOPTS, and use new larger runner (from Google ML Velocity) for the build job because the job is running out of disk space on the default runner.

It's impossible to test with the new ML runners in your fork, because the fork is not part of the TensorFlow org and thus doesn't have access to the runners. Need to provide the ability to specify the runner when invoking the workflow manually.

In some contexts in GitHub Actions workflows, you have to use single quotes. Remembering when and where in YAML is too hard. Just use single quotes everywhere.s

Despite that this was valid YAML, the GitHub expression parser seems to have a hard time with string literals that contain dashes. And apparently, it's unnecessary to provide a fallback in this case anyway.

MichaelBroughton

This seems like it's combining a lot of changes all at once that may interact in complex ways. Can we split this into two PRs ? The first being just the runner upgrade, very simple and short. Verify that this is working well and doesn't break. Then we can move onto doing all of the other stuff:

In addition, this PR makes some simplifications to the workflows to get rid of some of the overelaborate things I did before, and make minor other changes to keep with best practices such as putting timeouts on jobs.

mhucka · 2025-12-12T23:52:44Z

This seems like it's combining a lot of changes all at once that may interact in complex ways. Can we split this into two PRs ? The first being just the runner upgrade, very simple and short. Verify that this is working well and doesn't break. Then we can move onto doing all of the other stuff:

Yes, that's reasonable.

mhucka added 6 commits December 10, 2025 23:23

Simplify ci-nightly-build-test.yaml & add timeouts

7ebbc1f

Make steps simpler by using tricks such as setting SHELLOPTS, and use new larger runner (from Google ML Velocity) for the build job because the job is running out of disk space on the default runner.

Use pip cache & bazel repo cache

201bd01

These are safe to use.

Simplify ci-nightly-cirq-test.yaml & add use new runners

438ade9

Make steps simpler by using tricks such as setting SHELLOPTS, and use new larger runner (from Google ML Velocity) for the build job because the job is running out of disk space on the default runner.

Add field for runner in workflow_dispatch

5bec1c5

It's impossible to test with the new ML runners in your fork, because the fork is not part of the TensorFlow org and thus doesn't have access to the runners. Need to provide the ability to specify the runner when invoking the workflow manually.

Switch to single quotes

ad50c8b

In some contexts in GitHub Actions workflows, you have to use single quotes. Remembering when and where in YAML is too hard. Just use single quotes everywhere.s

Try another way to use the debug flag

e34f32a

mhucka force-pushed the mh-runners-for-nightly-jobs branch from 7945915 to e34f32a Compare December 10, 2025 23:24

mhucka added the area/devops Involves build systems, Make files, Bazel files, continuous integration, and/or other DevOps topics label Dec 10, 2025

mhucka marked this pull request as ready for review December 11, 2025 00:02

mhucka added 2 commits December 11, 2025 06:04

Fix runs-on values

73e6757

Despite that this was valid YAML, the GitHub expression parser seems to have a hard time with string literals that contain dashes. And apparently, it's unnecessary to provide a fallback in this case anyway.

Update action version hashes

9498a07

MichaelBroughton requested changes Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use larger runners for nightly jobs #933

Use larger runners for nightly jobs #933

Uh oh!

mhucka commented Dec 10, 2025 •

edited

Loading

Uh oh!

MichaelBroughton left a comment

Uh oh!

mhucka commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use larger runners for nightly jobs #933

Are you sure you want to change the base?

Use larger runners for nightly jobs #933

Uh oh!

Conversation

mhucka commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelBroughton left a comment

Choose a reason for hiding this comment

Uh oh!

mhucka commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhucka commented Dec 10, 2025 •

edited

Loading