Skip to content

CI

CI #5034

Triggered via schedule November 13, 2025 09:34
Status Failure
Total duration 3h 29m 30s
Artifacts 55

ci.yaml

on: schedule
metadata
2s
metadata
bump-manifest
18s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
2m 38s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 9s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
1m 52s
amd64 / test-nccl / build-mpi-operator-compatible-base
amd64  /  ...  /  build-nccl-gke
2m 14s
amd64 / test-nccl / nccl-test-gke / build-nccl-gke
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-nccl-gke
arm64 / test-nccl / nccl-test-gke / build-nccl-gke
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  build-torchax
7m 2s
amd64 / build-torchax
amd64  /  ...  /  launch-slurm-runner
28m 7s
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
4m 4s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
2h 39m
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-upstream-t5x
7m 19s
amd64 / build-upstream-t5x
amd64  /  build-axlearn
5m 3s
amd64 / build-axlearn
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  launch-slurm-runner
39m 3s
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  build-torchax
8m 16s
arm64 / build-torchax
arm64  /  test-nsys-jax-eks
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
9m 28s
arm64 / build-upstream-t5x
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
amd64  /  ...  /  maxtext-gke-xpk
9m 19s
amd64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
13m 13s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
16m 57s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
5m 23s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
arm64  /  ...  /  maxtext-gke-xpk
arm64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
16m 3s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
30s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
2s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
5s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
13s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
3s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
26s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
2s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
22s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
2s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
3s
make-publish-configs
merge-new-manifest
9s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
6s
finalize / workflow-badge
finalize  /  report
14s
finalize / report
finalize  /  upload-badge
15s
finalize / upload-badge
finalize  /  publish-badge
4s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 2 warnings
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
amd64 / test-te-a100 / te-A100-unit-test
The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
amd64 / test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
Process completed with exit code 1.
merge-new-manifest
Unexpected input(s) 'owner_and_repo', valid inputs are ['route', 'mediaType']
merge-new-manifest
Unexpected input(s) 'owner_and_repo', 'head', 'base', 'body', 'title', 'draft', valid inputs are ['route', 'mediaType']

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
566 Bytes
sha256:62fb2f465ec547e7f64a20a71c64dad064a0fa324355ee4c758d8fda7a325c2e
artifact-axlearn-build-arm64
568 Bytes
sha256:63c32e7a27ede6b1abc54b983cbd412641b97f29df4e4b1d2f48cbafa97b2b2e
artifact-axlearn-test
179 KB
sha256:90455e1134157794d8d27b5f6853b86f8974ee8cefe2d26a306d18f1fd5fe736
artifact-base-build-amd64
567 Bytes
sha256:9378d039487dcd2f281ca87e1bfe5538cb7e0e2ee847c3d7d1b19a02662e0405
artifact-base-build-arm64
567 Bytes
sha256:16cf74c011e9d76b16959006755ece9dae612d6f373690e6e9761add0c3fc61a
artifact-equinox-build-amd64
570 Bytes
sha256:80b49e4d79a0f0ec6e8ff11ff24a65bc4014d4c3d71031b724cbbca954ffcf62
artifact-equinox-build-arm64
568 Bytes
sha256:bb4e0ccb5634afaa0b89d82ea51716848afc40279a5d2e0e424a7336dc0504d1
artifact-final-report
4.04 KB
sha256:aa5efe10ee38e6cdf905796594e5857cb1f32b8911eb94eaed0b3d8cc04f832a
artifact-jax-build-amd64
554 Bytes
sha256:68fbdc9b689aafeddab4f78d457033ce5df30c237e052db069911450b63bccd0
artifact-jax-build-arm64
553 Bytes
sha256:bf851474583fee463442214d59cc41f5c6a9718cff8fb99e3af8b38a95249608
artifact-maxtext-build-amd64
567 Bytes
sha256:5ced171f2f0d341afa071fd4b1b91dcf98010f4ac0e605b9978162025159d920
artifact-maxtext-build-arm64
568 Bytes
sha256:f0bc028f6f40fbfda36bb1aa33897389c51c2ff28015c5136e228c9f2593f8a7
artifact-maxtext-test
1.45 KB
sha256:0eb76429d74db0c784606b1bfdb3f7963d9a722e9601c240d0651830f77d0096
artifact-mpi-operator-compatible-base-build-amd64
639 Bytes
sha256:97e79d9cbe93ebcd46388c919af942eee51245bf1c362e17f8b9f50ade0394cd
artifact-nccl-gke-build-amd64
571 Bytes
sha256:36365d27e7708ded99bbc9141ee077afba59a37deaadd4df12e44089598d9fb2
artifact-rosetta-build-t5x-amd64
584 Bytes
sha256:53dc1c12c2944a42ef485067ec1ccbb7f55b3e5f4cce8df5b4403502e633694b
artifact-rosetta-build-t5x-arm64
585 Bytes
sha256:a111eb4e5a13effa6abfbafbc3538363efcae4866928acaf0472d712e017605c
artifact-rosetta-t5x-mgmn-test
624 Bytes
sha256:a0b0e4126ff2fa73d7db2dbf32581c45baba4466fb701147692256bfcc50ede7
artifact-t5x-build-amd64
568 Bytes
sha256:f31331cbb5205e6e4b1064c2104e825978afc0c5b678bcb6d3c152ef76fff07f
artifact-t5x-build-arm64
567 Bytes
sha256:297fd5b72d81a346ab14bd1dd908d86e114f2c9123694f93580751af818c6b71
artifact-torchax-build-amd64
569 Bytes
sha256:ecdb2a03c575e5623571a328991ad5d35dccfbe1a677fa2dcda3e654f4d13a6c
artifact-torchax-build-arm64
568 Bytes
sha256:b1bce73c5131dcfeb533bbde1e6eaa8a8d017b1bdd9fa34f4116c07c13f5cf41
artifact-workflow-metadata
277 Bytes
sha256:4f008bca51d427b76dd5456325189c30918a5c97d2c4deb09526ce1b19877a9b
bumped-manifest
51.6 KB
sha256:7bd6c01cc331290e01a4b9ddff67e6e3b970d49d29210f47d45c371daad26251
final-axlearn
258 Bytes
sha256:1f9d072fe1ed6b35b2bbcbad920bd85dbba259e57e7f61900458c8b9e9c6fbd8
final-base
249 Bytes
sha256:1d9341a7d108cb25c46cad4127c75d7cc1de87690549b27fb96630e19f4ffa44
final-equinox
258 Bytes
sha256:d1731d73b974a0ff1c65d2ace5f6beb6d786112ac19e43bdc4590ab5edadad9f
final-jax
246 Bytes
sha256:a78c92def8c568f27faf86294f8954ae8bcce68568e49ba7bd1dd10048b39f3f
final-maxtext
258 Bytes
sha256:a56971b7bcb35475ef4f925365cfddac97c2608b45f199b215c9c44029b484da
final-t5x
246 Bytes
sha256:36349ff8b2ccb73a7580381526ba42739f215341a4e8f5aeeb9793a619492443
final-upstream-t5x
273 Bytes
sha256:15f06d9b9901c7e313094692904297a1e11bea5097dab011339e38168cb76371
gke-maxtext-train
368 MB
sha256:7ed0c2a7a062cdc821d60d5b436f27323b0dbaa2af561839ea3f57ace6d39a0e
gke-maxtext-train-sitrep
228 Bytes
sha256:39cb251b0816b1bd51c330c3a84b713ee65b974ca767963700abfdeafd318c65
jax-cutlass-test-H100
1.24 KB
sha256:b01cfc6c033b671a14816cc7e27579ce121213e2f56bb904bd24f12fb11dad14
jax-unit-test-A100
22.4 KB
sha256:531c0437505babe21ad949c35c71b740a0c9dff4afaba5e02b87d724e5138b06
mealkit-axlearn
269 Bytes
sha256:3ce98caf71d040c60ec887b7b2cf17bd063bf5dca421bce834a7321e74ae6244
mealkit-equinox
269 Bytes
sha256:4294691d34167b41ddb3a21f32a9068babd6594ed24e826e45db96bb18af49b8
mealkit-jax
256 Bytes
sha256:cd64415af5d04f684a70211e9c3248bf2445a9d83995c84a82127e1eeb19eefb
mealkit-maxtext
269 Bytes
sha256:d17376176805ac9cb8f31eefc3f3182607cdb0a9681dc70cfaba4155b47ec61d
mealkit-t5x
258 Bytes
sha256:c590a5fb2b848e7d35c5a38100e717ac8612164664ff3edf44fa3fc7c93d6417
mealkit-upstream-t5x
283 Bytes
sha256:731e910be89b26a321a49e600843ac0d6b3ba367b7edd726d2ca62360090bbbf
nccl-gke-all-gather
15.4 KB
sha256:4e5c63e7272cba70fd1484c6f40d917a8d97b849009972a44afbf9691a3fe423
nccl-gke-all-gather-sitrep
231 Bytes
sha256:0475d2dbb5e0442835b58f8b17f09af1accd0c6dc99e7edb331f0714d8794334
nccl-gke-all-reduce
15.6 KB
sha256:ef2dbb4088612a8bdff6c2c0e502f47fc0274521972804eeada0455ef969162b
nccl-gke-all-reduce-sitrep
231 Bytes
sha256:ebdcd799aebdf2ab98e915ea2a4f4e5806946325400c85fdd5dba16c1fd4825b
nccl-gke-broadcast
15.2 KB
sha256:1042f94a61b841e5159b7bf46e3b4e9e46c8006f4578bfbe24eeff9e1869dfe7
nccl-gke-broadcast-sitrep
229 Bytes
sha256:edd2100c4d24e050c86e780623351d8193295ee3cc0747acdd7353d303e70512
nccl-gke-reduce-scatter
15.5 KB
sha256:40b560d1ecf127771674b14120901e5076801f5a0981bc59a52422db12b5844b
nccl-gke-reduce-scatter-sitrep
234 Bytes
sha256:050b7b0ee82659f0b4f980e42d58c9d62475b37f3ef3d3f31528d7cc3a6bd5ed
nsys-jax-unit-test-A100
121 MB
sha256:60e55bf986d9f9cc54b4a6a44606dba2e9a49ae7ad2edf4809434ec871d85198
rosetta-t5x-vit-19326984166-VIT8G1N
15.6 KB
sha256:51d0a49e1f112116a73bb042c0ccab1c0b4a49b4435bd5d82daf2c4cd87c00fe
te-unit-test-H100
2.07 MB
sha256:8c8d18428b56ca4af295c968faf289ee09f52f00ab37099eb0d15f1289652bf5
upstream-maxtext-19326984166-1DP2FSDP4TP1PP_single_process
23.7 KB
sha256:acdcf00061090db7b5db654782bf59c5e86eec0e8168d6c509dee98ddc55dfdd
upstream-maxtext-19326984166-2DP2FSDP2TP1PP
36.4 KB
sha256:22e381958eb72982fa61a89b6d6158ae89b285a9ee1ec28f8133f8c8c4753581
upstream-maxtext-metrics-test-log
2.51 KB
sha256:226d43781455c5d5a0ad742a96664e03ccec10ea4e5fab2bdb0180d135ddfb7b