A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix
dc.contributor.author | Huang, Yin | |
dc.contributor.author | Yesha, Yelena | |
dc.contributor.author | Zhou, Shujia | |
dc.date.accessioned | 2018-10-31T18:27:12Z | |
dc.date.available | 2018-10-31T18:27:12Z | |
dc.date.issued | 2015-12-28 | |
dc.description | IEEE International Conference on Big Data | en_US |
dc.description.abstract | NoSQL distributed databases have been devised to tackle the challenges resulting from volume, velocity and variety of big data. Graph representation of datasets requires efficient distributed linear algebra operations for large sparse matrix constructed from big data. Storing the transformed matrix into the database not only speeds up the big data analysis process but also facilitates the computation because of indexing. The Hadoop based approach does not natively support iterative algorithms due to data shuffling during each iteration. This paper presents a novel database-based distributed computation architecture bridging the gap between Hadoop and HPC. The novelty results from exploring the indexing capability of D4M (Dynamic Distributed Dimensional Data Model) to support linear algebra operations in a distributed computation environment. The idea is to store input data and intermediate results in associative array format inside Accumulo table to facilitate the data sharing among working nodes. pMatlab is deployed as the parallel computation engine. Our proposed architecture is proved to be lighter, easier and faster than MapReduce based approach. One example application is calculating top k eigenvalues and eigenvectors for large sparse matrix. Experiments on Graph500 benchmark datasets demonstrate 2X speedup of our architecture as compared to HEIGEN (An eigensolver for billion-scale matrices using MapReduce). | en_US |
dc.description.sponsorship | The authors would like to thank IBM/CAS Toronto for supporting Yin Huang with a CAS fellowship. We would also like to thank NIST/SSD Information Systems Group for providing support to conduct this Big Data Analytics computation. We are also grateful to CHMPR for providing the IBM iDataPlex bluewave computational resources to conduct these data intensive experiments. In particular, we wish to acknowledge Dr. John Dorband for training one of the authors as a system administrator to establish the Hadoop based ecosystem. And we would also like to thank Prof Xian-He Sun from Department of Computer Science at the Illinois Institute of Technology for providing cluster resource. | en_US |
dc.description.uri | https://ieeexplore.ieee.org/document/7364045 | en_US |
dc.format.extent | 8 pages | en_US |
dc.genre | conference papers and proceedings pre-print | en_US |
dc.identifier | doi:10.13016/M2MG7G04F | |
dc.identifier.citation | Yin Huang, Yelena Yesha, and Shujia Zhou, A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix, 2015 IEEE International Conference on Big Data (Big Data) , DOI: 10.1109/BigData.2015.7364045 | en_US |
dc.identifier.uri | 10.1109/BigData.2015.7364045 | |
dc.identifier.uri | http://hdl.handle.net/11603/11809 | |
dc.language.iso | en_US | en_US |
dc.publisher | IEEE | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.rights | © 2015 IEEE | |
dc.subject | Big Data Analytics | en_US |
dc.subject | Distributed Computation Architecture | en_US |
dc.subject | Eigendecomposition | en_US |
dc.subject | Apache Accumulo | en_US |
dc.subject | D4M (Dynamic Distributed Dimensional Data Model) | en_US |
dc.subject | UMBC Ebiquity Research Group | en_US |
dc.title | A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix | en_US |
dc.type | Text | en_US |