Skip to content

Handle Changes to Slurm's REBOOT workflow #81

@hintron

Description

@hintron

Recent versions of Slurm have made changes in the reboot workflow and added a reboot state. As it is, in Slurm 17.11, doing an scontrol reboot or scontrol reboot asap returns the node back to IDLE before NHC can have a chance to offline the node. This causes jobs to run and potentially fail if a required resource isn't up yet.

I have a preliminary set of patches here that I'm opening up for feedback and suggestions.

I'm still testing things, and I still need to make sure that Slurm 18.08's scontrol reboot nextstate=<STATE> command works with these changes, though I don't believe it will have any affect.

Thanks,
Michael

P.S. This was originally brought up here: https://bugs.schedmd.com/show_bug.cgi?id=6391

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions