Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency
dc.contributor.author | Bachmann, Matthew G. | |
dc.contributor.author | Dyas, Ashley D. | |
dc.contributor.author | Kilmer, Shelby C. | |
dc.contributor.author | Sass, Julian | |
dc.contributor.author | Raim, Andrew | |
dc.contributor.author | Neerchal, Nagaraj K. | |
dc.contributor.author | Adragni, Kofi P. | |
dc.contributor.author | Ostrouchov, George | |
dc.contributor.author | Thorpe, Ian F. | |
dc.date.accessioned | 2018-10-01T13:53:13Z | |
dc.date.available | 2018-10-01T13:53:13Z | |
dc.date.issued | 2013 | |
dc.description.abstract | Programming with big data in R (pbdR), a package used to implement high-performance computing in the statistical software R, uses block cyclic distribution to organize large data across many processes. Because computations performed on large matrices are often not associative, a systematic approach must be used during parallelization to divide the matrix correctly. The block cyclic distribution method stresses a balanced load across processes by allocating sections of data to a corresponding node. This method achieves well divided data that each process computes individually and calculates a final result more efficiently. A nontrivial problem occurs when using block cyclic distribution: Which combinations of different block sizes and grid layouts are most effective? These two factors greatly influence computational efficiency, and therefore it is crucial to study and understand their relationship. To analyze the effects of block size and processor grid layout, we carry out a performance study of the block cyclic process used to compute a principal components analysis (PCA). We apply PCA both to a large simulated data set and to data involving the analysis of single nucleotide polymorphisms (SNPs). We implement analysis of variance (ANOVA) techniques in order to distinguish the variability associated with each grid layout and block distribution. Once the nature of these factors is determined, predictions about the performance for much larger data sets can be made. Our final results demonstrate the relationship between computational efficiency and both block distribution and processor grid layout, and establish a benchmark regarding which combinations of these factors are most effective. | en_US |
dc.description.sponsorship | These results were obtained as part of the REU Site: Interdisciplinary Program in High Performance Computing (www.umbc.edu/hpcreu) in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County (UMBC) in Summer 2013. This program is funded jointly by the National Science Foundation and the National Security Agency (NSF grant no. DMS–1156976), with additional support from UMBC, the Department of Mathematics and Statistics, the Center for Interdisciplinary Research and Consulting (CIRC), and the UMBC High Performance Computing Facility (HPCF). HPCF (www.umbc.edu/hpcf) is supported by the U.S. National Science Foundation through the MRI program (grant nos. CNS–0821258 and CNS–1228778) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from UMBC. Co-author Jordan Ramsey was supported, in part, by the UMBC National Security Agency (NSA) Scholars Program though a contract with the NSA. Graduate RA Andrew Raim was supported by UMBC as HPCF RA. | en_US |
dc.description.uri | https://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf | en_US |
dc.format.extent | 17 pages | en_US |
dc.genre | technical report | en_US |
dc.identifier | doi:10.13016/M2057CW7M | |
dc.identifier.uri | http://hdl.handle.net/11603/11411 | |
dc.language.iso | en_US | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Mathematics Department Collection | |
dc.relation.ispartof | UMBC Chemistry & Biochemistry Department | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.relation.ispartofseries | HPCF Technical Report;HPCF-2013-11 | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.subject | pbdR | en_US |
dc.subject | Block Cyclic Distribution | en_US |
dc.subject | Grid Layout | en_US |
dc.subject | Block Size | en_US |
dc.subject | PCA | en_US |
dc.subject | covariance | en_US |
dc.subject | correlation | en_US |
dc.subject | UMBC High Performance Computing Facility (HPCF) | en_US |
dc.title | Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency | en_US |
dc.type | Text | en_US |