Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes
Links to Fileshttps://archive.siam.org/students/siuro/vol8/index.php
MetadataShow full item record
Type of Work10 pages
undergraduate journal article
Citation of Original PublicationOluwapelumi Adenikinju, Julian Gilyard, Joshua Massey, Thomas Stitt, Matthias K. Gobbert, Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes, SIAM Undergraduate Research Online (SIURO), Volume 8, http://dx.doi.org/10.1137/15S013776
RightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
SubjectsHigh Performance Computing Facility (HPCF)
parallel solutions to linear systems
global illumination problem in computer graphics
We investigate the parallel solutions to linear systems with the application focus as the global illumination problem in computer graphics. An existing CPU serial implementation using the radiosity method is given as the performance baseline where a scene and corresponding form-factor coeffcients are provided. The initial computational radiosity solver uses the basic Jacobi method with a fixed iteration count as an iterative approach to solving the radiosity linear system. We add the option of using the modern BiCG-STAB method with the aim of reduced runtime for complex problems. It is found that for the test scenes used, the problem complexity was not great enough to take advantage of mathematical reformulation through BiCG-STAB. Single-node parallelization techniques are implemented through OpenMP-based multi- threading, GPU-offloading using CUDA, and hybrid multi-threading/GPU offloading. It is seen that in general OpenMP is optimal by requiring no expensive memory transfers. Finally, we investigate two storage schemes of the system to determine whether storage through arrays of structures or structures of arrays results in better performance. We nd that the usage of arrays of structures in conjunction with OpenMP results in the best performance except for small scene sizes, where CUDA shows the minimal runtime.