• Login
    View Item 
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC College of Natural and Mathematical Sciences
    • UMBC Mathematics and Statistics Department
    • View Item
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC College of Natural and Mathematical Sciences
    • UMBC Mathematics and Statistics Department
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Block Cyclic Distribution of Data in pbdR and its Effects on Computational Efficiency

    Thumbnail
    Files
    REU2013Team1.pdf (926.8Kb)
    Links to Files
    https://userpages.umbc.edu/~gobbert/papers/REU2013Team1.pdf
    Permanent Link
    http://hdl.handle.net/11603/11411
    Collections
    • UMBC Chemistry & Biochemistry Department
    • UMBC Faculty Collection
    • UMBC Mathematics and Statistics Department
    • UMBC Student Collection
    Metadata
    Show full item record
    Author/Creator
    Bachmann, Matthew G.
    Dyas, Ashley D.
    Kilmer, Shelby C.
    Sass, Julian
    Raim, Andrew
    Neerchal, Nagaraj K.
    Adragni, Kofi P.
    Ostrouchov, George
    Thorpe, Ian F.
    Date
    2013
    Type of Work
    17 pages
    Text
    technical report
    Rights
    This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
    Subjects
    pbdR
    Block Cyclic Distribution
    Grid Layout
    Block Size
    PCA
    covariance
    correlation
    UMBC High Performance Computing Facility (HPCF)
    Abstract
    Programming with big data in R (pbdR), a package used to implement high-performance computing in the statistical software R, uses block cyclic distribution to organize large data across many processes. Because computations performed on large matrices are often not associative, a systematic approach must be used during parallelization to divide the matrix correctly. The block cyclic distribution method stresses a balanced load across processes by allocating sections of data to a corresponding node. This method achieves well divided data that each process computes individually and calculates a final result more efficiently. A nontrivial problem occurs when using block cyclic distribution: Which combinations of different block sizes and grid layouts are most effective? These two factors greatly influence computational efficiency, and therefore it is crucial to study and understand their relationship. To analyze the effects of block size and processor grid layout, we carry out a performance study of the block cyclic process used to compute a principal components analysis (PCA). We apply PCA both to a large simulated data set and to data involving the analysis of single nucleotide polymorphisms (SNPs). We implement analysis of variance (ANOVA) techniques in order to distinguish the variability associated with each grid layout and block distribution. Once the nature of these factors is determined, predictions about the performance for much larger data sets can be made. Our final results demonstrate the relationship between computational efficiency and both block distribution and processor grid layout, and establish a benchmark regarding which combinations of these factors are most effective.


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3021


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.

     

     

    My Account

    LoginRegister

    Browse

    This CollectionBy Issue DateTitlesAuthorsSubjectsType

    Statistics

    View Usage Statistics


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3021


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.