Skip to content

~NGC release testing #397

~NGC release testing

~NGC release testing #397

Manually triggered October 9, 2025 10:18
Status Failure
Total duration 1h 43m 17s
Artifacts 6

ngc-release-testing.yaml

on: workflow_dispatch
test-maxtext-gke  /  maxtext-gke-xpk
9m 34s
test-maxtext-gke / maxtext-gke-xpk
test-nccl  /  ...  /  build-nccl-gke
3m 3s
test-nccl / nccl-test-gke / build-nccl-gke
Matrix: test-nccl / nccl-test-gke / nccl-gke
finalize  /  workflow-badge
5s
finalize / workflow-badge
finalize  /  report
6s
finalize / report
finalize  /  upload-badge
6s
finalize / upload-badge
finalize  /  publish-badge
5s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

4 errors
test-nccl / nccl-test-gke / nccl-gke (all_reduce_perf_mpi)
Process completed with exit code 1.
test-nccl / nccl-test-gke / nccl-gke (reduce_scatter_perf_mpi)
The strategy configuration was canceled because "test-nccl.nccl-test-gke.nccl-gke.all_reduce_perf_mpi" failed
test-nccl / nccl-test-gke / nccl-gke (broadcast_perf_mpi)
The strategy configuration was canceled because "test-nccl.nccl-test-gke.nccl-gke.all_reduce_perf_mpi" failed
test-nccl / nccl-test-gke / nccl-gke (all_gather_perf_mpi)
The strategy configuration was canceled because "test-nccl.nccl-test-gke.nccl-gke.all_reduce_perf_mpi" failed

Artifacts

Produced during runtime
Name Size Digest
artifact-final-report
751 Bytes
sha256:3f614067850d5b99ce473b5446e9b1c369a6660c1cc20fb7d49849737f7033c0
artifact-nccl-gke-build-amd64
571 Bytes
sha256:95a1c22b3280cbc8c39a06f58503fb6a0d5c913a7934aefc117e58b0e6fd871e
artifact-workflow-metadata
266 Bytes
sha256:b396c2eb9e2a6a6623e6538db159ec0896f61a238e999603471213ad4b91caf9
gke-maxtext-train
38.5 KB
sha256:7448ee685914cb8e3767786683161f609ec5625267dc373800b90fcccbe576c2
gke-maxtext-train-sitrep
228 Bytes
sha256:150ea79e84acb42b8a7b4e344a0fe7078f4dd8cde68cf42a601a7cae2bfc09ca
nccl-gke-all-reduce-sitrep
224 Bytes
sha256:149b98fa7ede6e7f1fb7f13ce44dbac06c0ecabffee4a79f4c6d2b45deda9826