-
Notifications
You must be signed in to change notification settings - Fork 87
Closed
Description
Recent versions of Slurm have made changes in the reboot workflow and added a reboot state. As it is, in Slurm 17.11, doing an scontrol reboot or scontrol reboot asap returns the node back to IDLE before NHC can have a chance to offline the node. This causes jobs to run and potentially fail if a required resource isn't up yet.
I have a preliminary set of patches here that I'm opening up for feedback and suggestions.
I'm still testing things, and I still need to make sure that Slurm 18.08's scontrol reboot nextstate=<STATE> command works with these changes, though I don't believe it will have any affect.
Thanks,
Michael
P.S. This was originally brought up here: https://bugs.schedmd.com/show_bug.cgi?id=6391
Metadata
Metadata
Assignees
Labels
No labels