Study of Exploiting Coarse-Grained Parallelism in Block-Oriented Numerical Linear Algebra Routines

Author/Creator ORCID

Date

2020-06-17

Department

Program

Citation of Original Publication

Gerson C. Kroiz et al., Study of Exploiting Coarse-Grained Parallelism in Block-Oriented Numerical Linear Algebra Routines,Proceedings in Applied Mathematics and Mechanics (2020), http://hpcf-files.umbc.edu/research/papers/S17_Kroiz_v1.pdf

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

We have developed streaming implementations of two numerical linear algebra operations that further exploit the block decomposition strategies commonly used in these operations to obtain performance. The implementations formulate algorithms as data flow graphs and use coarse-grained parallelism to (1) emit a block in the result matrix as soon as it becomes available and (2) compute on multiple blocks in parallel. This streaming design benefits data flow graphs consisting of multiple linear algebra operations as it removes synchronization points between successive operations: a result block from an operation can be used immediately in an algorithm’s successor operations without waiting for the full result from the first operation. Early comparisons with OpenBLAS functions on CPUs show comparable performance for computing with large dense matrices and an earliest arrival time of a result block that is up to 50x smaller than the time needed for a full result. More thorough studies can show the impact of such implementations on the performance of systems by chaining multiple linear algebra operations.