Study of Exploiting Coarse-Grained Parallelism in Block-Oriented Numerical Linear Algebra Routines
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2020-06-17
Type of Work
Department
Program
Citation of Original Publication
Gerson C. Kroiz et al., Study of Exploiting Coarse-Grained Parallelism in Block-Oriented Numerical Linear Algebra Routines,Proceedings in Applied Mathematics and Mechanics (2020), http://hpcf-files.umbc.edu/research/papers/S17_Kroiz_v1.pdf
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
We have developed streaming implementations of two numerical linear algebra operations that further exploit the block decomposition strategies commonly used in these operations to obtain performance. The implementations formulate algorithms
as data flow graphs and use coarse-grained parallelism to (1) emit a block in the result matrix as soon as it becomes available
and (2) compute on multiple blocks in parallel. This streaming design benefits data flow graphs consisting of multiple linear
algebra operations as it removes synchronization points between successive operations: a result block from an operation can
be used immediately in an algorithm’s successor operations without waiting for the full result from the first operation. Early
comparisons with OpenBLAS functions on CPUs show comparable performance for computing with large dense matrices and
an earliest arrival time of a result block that is up to 50x smaller than the time needed for a full result. More thorough studies
can show the impact of such implementations on the performance of systems by chaining multiple linear algebra operations.