Skip to content

check_ps_unauth_users() killing interactive SLURM jobs #15

@bbbbbrie

Description

@bbbbbrie

I experienced trouble with the SLURM implementation of check_ps_unauth_users() in release 1.4.2 of NHC killing interactive jobs. (Jobs submitted via sbatch are left alone.)

Undesired/unexpected behavior
check_ps_unauth_users: foo's "sleep" process is unauthorized. (PID 12347)
check_ps_unauth_users: foo's "/bin/bash" process is unauthorized. (PID 12372)

Upon closer inspection, this appeared to be a result of how the list of users with currently running jobs was calculated:

STAT_OUT=$(${STAT_CMD:-/usr/bin/stat} ${STAT_FMT_ARGS:--c} %U JOBFILE_PATH/job*/slurm_script)

Details
Job files like slurm_script are not created when interactive jobs are launched. Instead, there is a file with the node's hostname and job ID as a part of the filename:

|-- compute-0-2_1084.4294967294
|-- cred_state
|-- cred_state.old
-- job01084
-- slurm_script

Potential solution
I successfully addressed this locally using squeue, which can be configured to report just usernames:

STAT_OUT=$(squeue -w localhost --noheader -o %u)

This should report the username of all users with jobs running on the local node. (If a user is running jobs but not on this node, any processes she has on localhost are unauthorized.)

This has been tested with SLURM 15.08.7.

Please let me know if I have overlooked something or if you have any questions.

Thanks!

(NHC is awesome; thank you!)

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions