The Stack Trace Analysis Tool (STAT) is a highly scalable, lightweight tool that gathers and merges stack traces from all of the processes of a parallel application to form call graph prefix trees. STAT generates two prefix trees termed 2D-trace-space and 3D-trace-space-time. The 2D-trace-space prefix tree is a merge of a single stack trace from each task in the parallel application. The 3D-trace-space-time prefix tree is a merge of several stack traces from each task gathered over time. The latter provides insight into whether tasks are making progress or are in a hang state (livelock, deadlock, infiite loop, etc.). The call graph prefix trees also identify processes equivalence classes, processes exhibitin similar behavior with respect to their call paths. A representative task from each equivalence class can then be fed into a full-featured debugger for root cause analysis at a manageable scale.
0 commit comments