Sbosisio/fix nsys jax #5036
Triggered via pull request
November 14, 2025 17:09
Status
Failure
Total duration
1d 0h 55m 28s
Artifacts
45
ci.yaml
on: pull_request
metadata
3s
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64
/
...
/
build-mpi-operator-compatible-base
3m 42s
arm64
/
...
/
build-mpi-operator-compatible-base
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64
/
build-torchax
6m 48s
amd64
/
...
/
launch-slurm-runner
28m 56s
amd64
/
test-nsys-jax-eks
30m 8s
amd64
/
...
/
launch-slurm-runner
2h 28m
Matrix: amd64 / test-nsys-jax / run-unit-test
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64
/
build-torchax
7m 52s
arm64
/
test-nsys-jax-eks
0s
arm64
/
...
/
launch-slurm-runner
arm64
/
...
/
launch-slurm-runner
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64
/
test-axlearn-eks
17m 27s
amd64
/
test-axlearn-fuji-models-eks
5m 39s
Matrix: amd64 / test-nsys-jax-archive
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64
/
test-axlearn-eks
0s
arm64
/
test-axlearn-fuji-models-eks
0s
Matrix: arm64 / test-nsys-jax-archive
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
Matrix: publish-containers
finalize
/
publish-badge
4s
Annotations
11 errors
|
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
|
|
amd64 / test-te-a100 / te-A100-unit-test
The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
|
amd64 / test-nsys-jax / nsys-jax-A100-unit-test
Process completed with exit code 1.
|
|
amd64 / test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
|
|
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
Process completed with exit code 1.
|
|
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
Process completed with exit code 1.
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (reduce_scatter_perf_mpi)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_reduce_perf_mpi)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (broadcast_perf_mpi)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_gather_perf_mpi)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
|
amd64 / test-maxtext-gke / maxtext-gke-xpk
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
artifact-axlearn-build-amd64
|
565 Bytes |
sha256:39b2a30f1680c75cba69c2dfc7bf84eac3314ab26bb073ca1f25c475931180b5
|
|
|
artifact-axlearn-build-arm64
|
567 Bytes |
sha256:a871c514bfb4e1573e3317c816a3fcd87d92f15b0a14f2b5640320a3fb33bff6
|
|
|
artifact-axlearn-test
|
179 KB |
sha256:7d25dd66e379d2d31e1e58157ca3c7ab4e8ed4bf9c51397f8923fcffb1a85e0a
|
|
|
artifact-base-build-amd64
|
568 Bytes |
sha256:5cb9874ec00a0e682ef99141147c005ef766a91d627527f2b14aefb3c1bb6e09
|
|
|
artifact-base-build-arm64
|
566 Bytes |
sha256:e86fd1bd8c32212559ba691a589f9f09c6b0099cbd5df0df27e4f51aba381dca
|
|
|
artifact-equinox-build-amd64
|
569 Bytes |
sha256:82658b1bf9af835732644fb9f2fbd53302fd319afe661fb37fc2720276172885
|
|
|
artifact-equinox-build-arm64
|
568 Bytes |
sha256:49deb1b3c377009acc9a9bc0846ab0eb520f5e760a11c64b1b19ab190c31ae52
|
|
|
artifact-final-report
|
3.81 KB |
sha256:791125572f5489fae148467a7a31b2309efee2fafa4dfb6bd5f53b869003375f
|
|
|
artifact-jax-build-amd64
|
553 Bytes |
sha256:a1fe440251579f3dea8bb955487e1f12a1629ff386210df0ec294a5e3e606b98
|
|
|
artifact-jax-build-arm64
|
553 Bytes |
sha256:c7ccf3cbaf73c244de5004c39a97af42e937d9f1c987c850a2bac0122750400e
|
|
|
artifact-maxtext-build-amd64
|
568 Bytes |
sha256:4d140a630b348296a37399a8f555c9f4fd18b4cd0696b75cfd3788f7531d4d2d
|
|
|
artifact-maxtext-build-arm64
|
568 Bytes |
sha256:b47dec55580d566ef27f47574b75cb33baf7de7959ce37832bfa9e39f621bb0e
|
|
|
artifact-maxtext-test
|
1.46 KB |
sha256:81567cdf31ed21bcc040dc7c123a7ec0d1fbf4e375ff50f8a06aaaf3d05b7540
|
|
|
artifact-mpi-operator-compatible-base-build-amd64
|
639 Bytes |
sha256:8f48acc8c5f88d307257699d0deb4286a0a58470078d5d11b4aff8f051154778
|
|
|
artifact-nccl-gke-build-amd64
|
571 Bytes |
sha256:6533ee977ec0eafbf2c41331100b7f637ad4b7a2ca8297af16a2e5a9ecd566ec
|
|
|
artifact-rosetta-build-t5x-amd64
|
584 Bytes |
sha256:7eba5628e03a827a49309f9db392ea949c25e8eede50d0bfe2fc1fd5c994dd95
|
|
|
artifact-rosetta-build-t5x-arm64
|
584 Bytes |
sha256:3ebd70fe1b7841416a7ec7fe18852d7483b45ed3e1d6a3f359d66fb0b46f7f4a
|
|
|
artifact-rosetta-t5x-mgmn-test
|
624 Bytes |
sha256:17f3899b81b3a873e2b4d8f8430c284809761d128a1871506253fe57f6567807
|
|
|
artifact-t5x-build-amd64
|
570 Bytes |
sha256:1297720be586f6a862c25db604ee56772ee4f3d7e4bd52f4a561383af6aaf98a
|
|
|
artifact-t5x-build-arm64
|
568 Bytes |
sha256:25136b7a95e4990a8945e2f9c2c61e0091fabc89511ae7c585383f6e6d1f4812
|
|
|
artifact-torchax-build-amd64
|
567 Bytes |
sha256:09313dfe050778f35e3432f99fc77db8160ebb719a013091ccb5f074ff922a4d
|
|
|
artifact-torchax-build-arm64
|
566 Bytes |
sha256:09f28fc16c370b1fc502f0eacc7695994ee60a89a508f18f7e0f4118ad3c71f3
|
|
|
artifact-workflow-metadata
|
277 Bytes |
sha256:acf03161ded0c4453e63a5c120ebf4c10e50e0720603baa73b281052aa2e0034
|
|
|
bumped-manifest
|
51.6 KB |
sha256:2ad648433a30390ec30d22b4537368a1f71f424a7f58c763471ac2812b33adf2
|
|
|
final-axlearn
|
263 Bytes |
sha256:5bf03f382fd3c181063e735b30949ae8b70e637116eac8d7fcb0f509b3ed1bdc
|
|
|
final-base
|
254 Bytes |
sha256:785d08dbb099a8073cd0c05c7aaf561d1ffa734f52e7a1d9a1efaba82e177e83
|
|
|
final-equinox
|
263 Bytes |
sha256:6220814edc60996075839f314db3875b490b5fe9d82aa116272a79e4b8b48001
|
|
|
final-jax
|
251 Bytes |
sha256:ab8725abc0d5120db78df1ac743d5520d680cbe48bd51b46da056f4b6741cc65
|
|
|
final-maxtext
|
263 Bytes |
sha256:f0d519af03c7ec7d1286fa6bfdb16970a7b2a45b3073610a0b7e29d48f14c221
|
|
|
final-t5x
|
251 Bytes |
sha256:42ecb5e574abe5f37e55803d73e2285a5f43d23976c8b47dfde9fec6d4e250c2
|
|
|
final-upstream-t5x
|
277 Bytes |
sha256:a48c7226dd4ec913b05f3edced1cd0bd131cabe25a24ec8c66b94b9ae4d90251
|
|
|
jax-cutlass-test-H100
|
4.69 KB |
sha256:889e88e0ddfa698b0deacd86aade962001e640e8b6623351a2908507fbd225e4
|
|
|
jax-unit-test-A100
|
22.5 KB |
sha256:e7e59762c0e1021188cceb18a0043df778eb8d104ca041d10064f95ee9af3522
|
|
|
mealkit-axlearn
|
272 Bytes |
sha256:77d6213bcf8d72ead2ac3795a09b7af24e21a283b0048aef00ec65f0b05cf91a
|
|
|
mealkit-equinox
|
273 Bytes |
sha256:1f5590f756f72579985a1c3d8ac53bc0517a1e8aa9374cb4ee7ff750b96874a0
|
|
|
mealkit-jax
|
261 Bytes |
sha256:55f1f0db13bbdf5a004d786a8108ff0c5780c1dc29947626be39244dbf162386
|
|
|
mealkit-maxtext
|
272 Bytes |
sha256:110392e50d27be7f238d12bcad76f5e3b4b3d00a4673739114a4274de218dc7b
|
|
|
mealkit-t5x
|
261 Bytes |
sha256:04d70be3bbc4804edd699fcb8e32b153eef75ad282142d6ce2c586c16f2a45e8
|
|
|
mealkit-upstream-t5x
|
286 Bytes |
sha256:13c50f1b632b748abd57d1206d52341b297311fe5482eb667ecb421a09c4381d
|
|
|
nsys-jax-unit-test-A100
|
138 MB |
sha256:a3b56a6bc5b790d8657ae2ea470d077ef069f1816e7e0797b20dd5200116fbd7
|
|
|
rosetta-t5x-vit-19372009803-VIT8G1N
|
15.7 KB |
sha256:53d1e9e2c81565037d8e4c92e542602cccb08adcf40b2aced26a358c5cf4ee94
|
|
|
te-unit-test-H100
|
2.09 MB |
sha256:99b4650e56082039ee0ff89a98bd4ac1569af904f8faaf52de1cca2428b0ab4d
|
|
|
upstream-maxtext-19372009803-1DP2FSDP4TP1PP_single_process
|
23.7 KB |
sha256:1ba1bb1e53a358cfcbb427e9e2b4828b04a50588b23d257f15be7e9e45079ca7
|
|
|
upstream-maxtext-19372009803-2DP2FSDP2TP1PP
|
28 KB |
sha256:08c775c843449611a3aa0c85ec09f1eb6a83d7ac5688947b40483842f718208e
|
|
|
upstream-maxtext-metrics-test-log
|
2.52 KB |
sha256:4f53ecc070b05660932dedb413a75c0412c7cad1dd1b1b156fb610ed50e35cd2
|
|