Skip to content

CPU and Max RSS Analysis tools #2100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ChrisPaulBennett
Copy link
Contributor

@ChrisPaulBennett ChrisPaulBennett commented Mar 12, 2025

This apart of 3 pull requests for adding CPU time and Max RSS analysis to the Cylc UI.

This adds the Max RSS and CPU time (as measured by cgroups) to the table view, box plot and time series views.

Linked to;
cylc/cylc-flow#6663
cylc/cylc-uiserver#675

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@ChrisPaulBennett ChrisPaulBennett marked this pull request as draft March 12, 2025 09:38
@oliver-sanders oliver-sanders added this to the Pending milestone Mar 12, 2025
@MetRonnie

This comment was marked as resolved.

@MetRonnie
Copy link
Member

We'll get round to this when we can, but a little bit swamped at the moment!

@wxtim
Copy link
Member

wxtim commented Apr 16, 2025

I've had a first look -

The "chips" look good:

image

  • Is there a reason I don't get maxrss for the hpc background job?
  • Might these be more useful if the data was converted to human readable?

If nothing has finished the analysis view sticks in this state:

image

I threw a fairly nasty workflow at it and noticed some anomalies:

image

Cannot page through, cannot see tasks I know to have completed.

@ChrisPaulBennett
Copy link
Contributor Author

Oliver recommended the best way to move data around cylc was to use a debug message. Those chips only only appear in dev mode and won't be visible to normal users.
What's a HPC background job?
The analysis view being stuck in that state was introduced when I merged in @oliver-sanders changes to SQL. I haven't figured out what is causing this yet. Clicking refresh will solve the issue.

@wxtim
Copy link
Member

wxtim commented Apr 23, 2025

What's a HPC background job?

Using the supercomputer login node as just another computer. We use the term background to mean execution a job in bash without using a job runner like PBS or Slurm.

E.g. > ssh exab bash program.sh rather than ssh exab qsub program.sh

Warning

Irrelevant
@oliver-sanders pointed out to me after I posted the original comment that background jobs don't use cgroups in the same way, so the profiler probably won't work for them anyway.

@oliver-sanders
Copy link
Member

oliver-sanders commented Apr 23, 2025

Is there a reason I don't get maxrss for the hpc background job?

  1. You would need to configure this platform to run the profiler.
  2. There are no cgroups to poll for background jobs so it wouldn't work if you tried.

Those chips only only appear in dev mode and won't be visible to normal users.

Debug is def the right level to send these messages at, however, I'm not sure we've got the UI to hide debug level messages yet. Orthogonal to this PR either way.

@oliver-sanders oliver-sanders added the data workflows team Work pertaining to the analysis/gantt/etc views label May 20, 2025
Comment on lines +124 to +135
// If memory value passed
} else if (timingOption === 'maxRss') {
if (value < 5000) {
const kilobytes = value
return kilobytes.toPrecision(3) + ' KB'
} else if (value / 1024 < 1000) {
const megabytes = value / 1024
return megabytes.toPrecision(3) + ' MB'
} else {
const gigabytes = value / 1048576
return gigabytes.toPrecision(3) + ' GB'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this ought to be in a different function as it's not a duration

location="right"
v-if="timingOption === 'cpuTime'"
>
Total CPU Time Of Suite {{ formatDuration(tasks[0].totalOfTotals, false, 'cpuTime') }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow rather than suite, or just:

Suggested change
Total CPU Time Of Suite {{ formatDuration(tasks[0].totalOfTotals, false, 'cpuTime') }}
Total CPU Time: {{ formatDuration(tasks[0].totalOfTotals, false, 'cpuTime') }}

Comment on lines +21 to 26
v-if="!Object.keys(callback.tasks).length"
type="table"
class="align-content-start"
/>
<v-container
fluid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should prevent skeleton loader displaying at same time as the actual stuff:

Suggested change
v-if="!Object.keys(callback.tasks).length"
type="table"
class="align-content-start"
/>
<v-container
fluid
v-if="!Object.keys(callback.tasks).length"
type="table"
class="align-content-start"
/>
<v-container
v-else
fluid

Comment on lines +137 to +143
let stats = false
for (let i = 0; i < this.tasks.length; i++) {
if (this.tasks[i].count > 1) {
stats = true
break
}
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let stats = false
for (let i = 0; i < this.tasks.length; i++) {
if (this.tasks[i].count > 1) {
stats = true
break
}
]
}
const stats = this.tasks.some((task) => task.count > 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data workflows team Work pertaining to the analysis/gantt/etc views
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants