Skip to content

Commit 986e63f

Browse files
committed
Add check_gpfs_health to README
1 parent 1c9ef42 commit 986e63f

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,7 @@ The table below provides a list of the configuration variables which may be used
452452
| MCELOG_MAX_CORRECTED_RATE | `9` | Maximum number of **corrected** MCEs allowed before `check_hw_mcelog()` returns failure |
453453
| MCELOG_MAX_UNCORRECTED_RATE | `0` | Maximum number of **uncorrected** MCEs allowed before `check_hw_mcelog()` returns failure |
454454
| MDIAG_CMD | `mdiag` | Command to use to invoke Moab's `mdiag` command (may include path) |
455+
| MMHEALTH | `/usr/lpp/mmfs/bin/mmhealth` | Command to use to invoke the GPFS `mmhealth` command |
455456
| *NAME | `nhc` | Used to populate default paths/filenames for configuration |
456457
| NHC_AUTH_USERS | `root nobody` | Users authorized to have arbitrary processes running on compute nodes |
457458
| NHC_CHECK_ALL | `0` | Forces all checks to be non-fatal. Displays each failure message, reports total number of failed checks, and returns that number. |
@@ -768,6 +769,23 @@ _**Example**_: `check_fs_size /tmp 512m 4g`
768769
_**Example**_: `check_fs_used / 98%`
769770
770771
772+
<br />
773+
774+
##### check_gpfs_health
775+
`check_gpfs_health [-0] [-a] [-l] [-s] [-e <action>] <component>`
776+
777+
Checks the health of a GPFS component. The value for _component_ must match a component reported by mmhealth.
778+
779+
| **Check&nbsp;Option** | **Purpose** |
780+
| ---------------- | ----------- |
781+
| `-0` | Non-fatal. Failure of this check will be ignored. |
782+
| `-a` | Find, report, and act on all matching components. Default behavior is to fail check after first matching component. |
783+
| `-l` | Log unhealthy component (or components, if used with `-a`) to NHC log (`$LOGFILE`). |
784+
| `-s` | Log unhealthy component (or components, if used with `-a`) to the syslog. |
785+
| `-e`_`action`_ | Execute `/bin/bash -c`_`action`_ if component is NOT healthy. |
786+
787+
_**Example**_: `check_gpfs_health NETWORK`
788+
771789
<br />
772790
773791

0 commit comments

Comments
 (0)