-
Notifications
You must be signed in to change notification settings - Fork 482
Add check for stuck jobs in poll() #5451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
ffe6301
6a51c30
27f7594
cbd9b8e
cec23a1
7483a35
c159e89
f68fa1e
b657da4
7d31bf5
6948c6f
eb07ea0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -149,6 +149,39 @@ bool ACTIVE_TASK_SET::poll() { | |
} | ||
} | ||
} | ||
|
||
// check if a job is "stuck" (did not make progress in the last hour) | ||
// notify the user about the issue | ||
// abort after some time | ||
static double last_stuck_check_time = 0; | ||
if (gstate.now - last_stuck_check_time > STUCK_CHECK_POLL_PERIOD) { | ||
last_stuck_check_time = gstate.now; | ||
for (i=0; i<active_tasks.size(); i++){ | ||
ACTIVE_TASK* atp = active_tasks[i]; | ||
if (atp->non_cpu_intensive()) continue; | ||
if (atp->sporadic()) continue; | ||
if (atp->stuck_check_elapsed_time == 0) { | ||
// first pass | ||
atp->stuck_check_elapsed_time = atp->elapsed_time; | ||
atp->stuck_check_fraction_done = atp->fraction_done; | ||
atp->stuck_check_cpu_time = atp->current_cpu_time; | ||
continue; | ||
} | ||
if (atp->elapsed_time < atp->stuck_check_elapsed_time + STUCK_CHECK_POLL_PERIOD) continue; | ||
if (atp->stuck_check_fraction_done == atp->fraction_done && | ||
(atp->current_cpu_time - atp->stuck_check_cpu_time) < 10) { | ||
// if fraction done does not change and cpu time is <10, message the user | ||
msg_printf(atp->result->project, MSG_USER_ALERT, | ||
"[task] has not made progress in last hour, consider aborting task %s", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe the better working should be
@davidpanderson, could you please verify the spelling? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm worried about the direction this might take the BOINC client (eventually), or might lead the user. Not every wrapper-using project successfully manages to implement a realistic measure of progress made. This direction of thought changes the progress report from 'nice to have' towards 'essential'. It needs to be explained carefully. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A job is regarded as stuck if, in the last hour of running, b) by itself doesn't make it stuck. |
||
atp->result->name | ||
); | ||
} | ||
atp->stuck_check_elapsed_time = atp->elapsed_time; | ||
atp->stuck_check_fraction_done = atp->fraction_done; | ||
atp->stuck_check_cpu_time = atp->current_cpu_time; | ||
} | ||
} | ||
|
||
if (action) { | ||
gstate.set_client_state_dirty("ACTIVE_TASK_SET::poll"); | ||
} | ||
|
Uh oh!
There was an error while loading. Please reload this page.