Skip to content

Conversation

cerlane
Copy link

@cerlane cerlane commented Aug 25, 2025

Start again using a branch from #231

Copy link

preview available: https://docs.tds.cscs.ch/241

@bcumming bcumming changed the title Fix/feedbacks Add docs for GPU saturation tool Aug 27, 2025
Copy link
Member

@bcumming bcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I have some suggested changes, and I have tried to add some extra information that might have been missing in earlier reviews.


The following guide will explain how to install and use `gssr` within a container.

Most CSCS users leverage on the base containers with pre-installed CUDA from Nvidia. As such, in the following documentation, we will use a PyTorch base container as an example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most CSCS users leverage on the base containers with pre-installed CUDA from Nvidia. As such, in the following documentation, we will use a PyTorch base container as an example.
Most CSCS users leverage on the base containers with pre-installed CUDA from Nvidia.
As such, in the following documentation, we will use a PyTorch base container as an example.

Sorry if it wasn't clear in the previous review, but the "one sentence per line" rule looks like this suggested change.
You don't have to change the content at all, instead put each sentence in a paragraph on its own line: the generated docs will join them together into a paragraph (you need a blank line to start a new paragraph)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that it plays nicer with git, reducing the chances of annoying merge conflicts when making changes to the docs in the future.

The most commonly used Nvidia container used on Alps is the [Nvidia's PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). Typically the latest version is preferred for the most up-to-date functionalities of PyTorch.

#### Example: Preparing a Nvidia PyTorch ContainerFile
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```dockerfile

This will give nice syntax highlighting in the generated docs.

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
&& apt-get install -y wget rsync rclone vim git htop nvtop nano \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is nano needed here?

```
As you can see from the above example, gssr can easily be installed with a `RUN pip install gssr` command.

Once your `ContainerFile` is ready, you can build it on any Alps platforms with the following commands to create a container with label `mycontainer`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs have a guide on how to build containers on Alps, that you could like to.

For more information about building containers on Alps, see our [Podman guide][ref-building-containers].

https://docs.cscs.ch/contributing/#internal-links


## Create CSCS configuration for Container

The next step is to tell CSCS container engine solution where your container is and how you would like to run it. To do so, you will have to create a`{label}.toml` file in your `$HOME/.edf` directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the existing documentation for the EDF file format, to make your life easier.
Find sections to link to here: https://docs.cscs.ch/software/container-engine/


gssr analyze -i ./profile_out --report

A/Multiple PDF report(s) will be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A/Multiple PDF report(s) will be generated.
At least one PDF report will be generated.

* [Quickstart Guide][ref-gssr-quickstart]
* [Container Guide][ref-gssr-containers]

This tool will produce time-series and heatmaps of the profiled metric values. Here is an example of one set of plots generated by the tool from the application Megatron-LLM from EPFL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidance on including images has been updated:
https://docs.cscs.ch/contributing/#screenshots

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too follow up - the images are attractive and suggest that the tool is capable of providing diverse feedback.

Maybe you could add a brief documentation about the type of feedback provided, and use the images to illustrate this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants