Skip to content

Reading /proc/$PID/cmdline can hang in certain filesystem scenarios #156

@multimeric

Description

@multimeric

We have a tape-backed filesystem with retrieve-on-read, meaning that processes can go into uninterruptible sleep (D) when trying to read a file, as they wait for it to be retrieved from tape. When this happens, reading /proc/$PID/cmdline on the sleeping process hangs forever, which I believe is explained here. Consequently, the ps calls done by NHC fail, and this triggers a node health check alert. However, this is not desirable: the node is actually running fine, it's just that one or more processes are sleeping and can't report their cmdline.

I'm not entirely sure which ps invocation in NHC is triggering this, as there are several. I wonder if we need to request the cmdline since it has this undesirable property of potentially hanging? Alternatively, if it's helpful to most users, can there be a configuration option to turn this off?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions