Skip to content

LaunchMON hang #36

@jeffreybquinn

Description

@jeffreybquinn

We have recently upgraded our internal development cluster (Xeon Skylake Gold 6140/38 with SLES12 SP2). In rebuilding the STAT debug tool and its dependencies such as LaunchMON, we've encountered hang failures for LaunchMON smoketests test.attach_1 & test.launch_1.

Versions in use:
LaunchMon 1.0.2
gcc 5.4.0
openmpi 1.10.7
slurm 16.05.10-2

(These are the versions specified by our BKC build recipe. Our plan is to stage updating to newest versions after the baseline has been re-established.)

For our debug, we were hoping to gain access to logs/traces of successful runs of these two smoke tests on a similar configuration. We believe a differential analysis of this sort can help point us toward the configuration and build settings we need to adjust. We are additionally collecting strace logs to narrow down the hang point, but having trouble interpreting due to lack of in-depth familiarity with test operation and library operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions