Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency

dc.contributor.authorBachmann, Matthew G.
dc.contributor.authorDyas, Ashley D.
dc.contributor.authorKilmer, Shelby C.
dc.contributor.authorSass, Julian
dc.contributor.authorRaim, Andrew
dc.contributor.authorNeerchal, Nagaraj K.
dc.contributor.authorAdragni, Kofi P.
dc.contributor.authorOstrouchov, George
dc.contributor.authorThorpe, Ian F.
dc.date.accessioned2018-10-01T13:53:13Z
dc.date.available2018-10-01T13:53:13Z
dc.date.issued2013
dc.description.abstractProgramming with big data in R (pbdR), a package used to implement high-performance computing in the statistical software R, uses block cyclic distribution to organize large data across many processes. Because computations performed on large matrices are often not associative, a systematic approach must be used during parallelization to divide the matrix correctly. The block cyclic distribution method stresses a balanced load across processes by allocating sections of data to a corresponding node. This method achieves well divided data that each process computes individually and calculates a final result more efficiently. A nontrivial problem occurs when using block cyclic distribution: Which combinations of different block sizes and grid layouts are most effective? These two factors greatly influence computational efficiency, and therefore it is crucial to study and understand their relationship. To analyze the effects of block size and processor grid layout, we carry out a performance study of the block cyclic process used to compute a principal components analysis (PCA). We apply PCA both to a large simulated data set and to data involving the analysis of single nucleotide polymorphisms (SNPs). We implement analysis of variance (ANOVA) techniques in order to distinguish the variability associated with each grid layout and block distribution. Once the nature of these factors is determined, predictions about the performance for much larger data sets can be made. Our final results demonstrate the relationship between computational efficiency and both block distribution and processor grid layout, and establish a benchmark regarding which combinations of these factors are most effective.en_US
dc.description.sponsorshipThese results were obtained as part of the REU Site: Interdisciplinary Program in High Performance Computing (www.umbc.edu/hpcreu) in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County (UMBC) in Summer 2013. This program is funded jointly by the National Science Foundation and the National Security Agency (NSF grant no. DMS–1156976), with additional support from UMBC, the Department of Mathematics and Statistics, the Center for Interdisciplinary Research and Consulting (CIRC), and the UMBC High Performance Computing Facility (HPCF). HPCF (www.umbc.edu/hpcf) is supported by the U.S. National Science Foundation through the MRI program (grant nos. CNS–0821258 and CNS–1228778) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from UMBC. Co-author Jordan Ramsey was supported, in part, by the UMBC National Security Agency (NSA) Scholars Program though a contract with the NSA. Graduate RA Andrew Raim was supported by UMBC as HPCF RA.en_US
dc.description.urihttps://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdfen_US
dc.format.extent17 pagesen_US
dc.genretechnical reporten_US
dc.identifierdoi:10.13016/M2057CW7M
dc.identifier.urihttp://hdl.handle.net/11603/11411
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics Department Collection
dc.relation.ispartofUMBC Chemistry & Biochemistry Department
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofseriesHPCF Technical Report;HPCF-2013-11
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectpbdRen_US
dc.subjectBlock Cyclic Distributionen_US
dc.subjectGrid Layouten_US
dc.subjectBlock Sizeen_US
dc.subjectPCAen_US
dc.subjectcovarianceen_US
dc.subjectcorrelationen_US
dc.subjectUMBC High Performance Computing Facility (HPCF)en_US
dc.titleBlock Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiencyen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
REU2013Team1.pdf
Size:
926.83 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: