Add nvbandwidth sample #20

elezar · 2025-02-19T15:08:22Z

These changes add an nvbandwidth CUDA sample to allow for testing GPU bandwitdth between multiple GPUs.

These chages would produce the following images:

docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2-ubuntu22.04
docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2

ArangoGutierrez · 2025-02-19T16:02:11Z

deployments/container/Makefile

 build-%: DOCKERFILE = $(CURDIR)/deployments/container/Dockerfile.$(DOCKERFILE_SUFFIX)
+else
+build-%: DOCKERFILE = $(CURDIR)/deployments/container/$(SAMPLE)/Dockerfile.$(DOCKERFILE_SUFFIX)
+endif


Could we modify the IMAGE_TAG here, I don't think nvbandwidth-8169f9fa-ubuntu22.04 is a good tag for the nvbandwidth image, maybe we want nvbandwidth-8169f9fa

we need the nvbandwidth and cuda_version tag actually. these images are version sensitive.

Yeah I discussed this with @klueska

We can update the tags to be whatever we want them to be. Please remember that:

The VERSION for the images released from this repo is cuda12.6.2 for example.

The tag should be different for each build (e.g. SHA) so that we can test early access bits.

The image will be released when tagging the (internal) repo. Currently we tag with cuda<VERSION> since the base images are the main driver for updates.

guptaNswati · 2025-02-19T17:48:10Z

docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2-ubuntu22.04

docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2

This is not a cuda sample. These are standalone memory benchmarking tests. https://github.com/NVIDIA/nvbandwidth
If we dont want to do nvidia/k8s-sample then i would propose we do nvidia/nvbandwidth. Note that i added it to k8s-sample so that pushing it to NGC is bit faster since we already have the repo.

guptaNswati · 2025-02-19T17:53:34Z

.github/workflows/image.yaml

        - vectorAdd
        - nbody
        - deviceQuery
+        - nvbandwidth


I dont think it belongs here. It is a separate build/Dockerfile. I added another cuda-sample that should go here. See this #18

it does, see the structure of the GitHub action and how Evan is creating a new Make target

ArangoGutierrez · 2025-02-19T19:27:58Z

docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2-ubuntu22.04

docker.io/nvidia/cuda-sample:nvbandwidth-cuda12.6.2

This is not a cuda sample. These are standalone memory benchmarking tests. https://github.com/NVIDIA/nvbandwidth If we dont want to do nvidia/k8s-sample then i would propose we do nvidia/nvbandwidth. Note that i added it to k8s-sample so that pushing it to NGC is bit faster since we already have the repo.

It was a typo from Evan's point of view, the GH action will produce
ghcr.io/nvidia/k8s-samples:nvbandwidth-cuda12.6.2-ubuntu22.04 as per GItHub registry the image name is the repo name, we can control the tag.

https://github.com/NVIDIA/k8s-samples/pull/20/files#diff-b4df0a4f0d80f73138c476afbd7aefdac9df339642ddfba323d27c8cbabb92e2R90

elezar · 2025-02-20T13:14:59Z

/ok-to-test

ArangoGutierrez · 2025-02-20T13:21:45Z

/ok to test

ArangoGutierrez · 2025-02-20T13:43:41Z

/ok to test

ArangoGutierrez · 2025-02-20T17:06:28Z

/ok to test

This change adds an nvbandwidth sample that can be used to test both single and multi-node GPU interconnectivity. The multi-arch images are generated with the following image root: nvcr.io/ghcr.io/nvidia/k8s-samples:nvbandwidth Signed-off-by: Swati Gupta <[email protected]> Signed-off-by: Evan Lezar <[email protected]>

elezar · 2025-02-21T15:12:10Z

Closing in favour of #19

elezar mentioned this pull request Feb 19, 2025

Add nvbandwidth sample #19

Merged

elezar changed the title ~~Bandwidthtest~~ Add nvbandwidth sample Feb 19, 2025

ArangoGutierrez reviewed Feb 19, 2025

View reviewed changes

guptaNswati reviewed Feb 19, 2025

View reviewed changes

elezar force-pushed the bandwidthtest branch from 4db2e70 to 86cce93 Compare February 20, 2025 13:03

elezar marked this pull request as ready for review February 20, 2025 13:14

elezar force-pushed the bandwidthtest branch 3 times, most recently from b6b0c77 to 8103ba9 Compare February 20, 2025 16:30

ArangoGutierrez approved these changes Feb 20, 2025

View reviewed changes

elezar force-pushed the bandwidthtest branch from 8103ba9 to 6c3323c Compare February 21, 2025 14:46

elezar closed this Feb 21, 2025

Add nvbandwidth sample #20

Add nvbandwidth sample #20

Uh oh!

Conversation

elezar commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArangoGutierrez Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

guptaNswati Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

elezar Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

guptaNswati commented Feb 19, 2025

Uh oh!

guptaNswati Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez commented Feb 19, 2025

Uh oh!

elezar commented Feb 20, 2025

Uh oh!

ArangoGutierrez commented Feb 20, 2025

Uh oh!

ArangoGutierrez commented Feb 20, 2025

Uh oh!

ArangoGutierrez commented Feb 20, 2025

Uh oh!

elezar commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elezar commented Feb 19, 2025 •

edited

Loading