Comparison of Performance Analysis Tools for Parallel Programs Applied to CombBLAS

Author/Creator ORCID

Date

2015

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

Performance analysis tools are powerful tools for high performance computing. By breaking down a program into how long the CPUs are taking on each process (pro- filing) or showing when events take place on a timeline over the course of running a program (tracing), a performance analysis tool can tell the programmer exactly, where the computer is running slowly. With this information, the programmer can focus on these performance "hotspots," and the code can be optimized to run faster. We com- pared the performance analysis tools TAU, ParaTools ThreadSpotter, Intel VTune, Scalasca, HPCToolkit, and Score-P to the example code CombBLAS (combinatorial BLAS) which is a C++ implemenation of the GraphBLAS, a set of graph algorithms using BLAS (Basic Linear Algebra Subroutines). Using these performance analysis tools on CombBLAS, we found three major "hotspots" and attempted to improve the code. We were unsuccessful in improving these "hotspots" due to a time limitation but still gave suggestions on improving the OpenMP calls in the CombBLAS code.