A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix

dc.contributor.authorHuang, Yin
dc.contributor.authorYesha, Yelena
dc.contributor.authorZhou, Shujia
dc.date.accessioned2018-10-31T18:27:12Z
dc.date.available2018-10-31T18:27:12Z
dc.date.issued2015-12-28
dc.descriptionIEEE International Conference on Big Dataen_US
dc.description.abstractNoSQL distributed databases have been devised to tackle the challenges resulting from volume, velocity and variety of big data. Graph representation of datasets requires efficient distributed linear algebra operations for large sparse matrix constructed from big data. Storing the transformed matrix into the database not only speeds up the big data analysis process but also facilitates the computation because of indexing. The Hadoop based approach does not natively support iterative algorithms due to data shuffling during each iteration. This paper presents a novel database-based distributed computation architecture bridging the gap between Hadoop and HPC. The novelty results from exploring the indexing capability of D4M (Dynamic Distributed Dimensional Data Model) to support linear algebra operations in a distributed computation environment. The idea is to store input data and intermediate results in associative array format inside Accumulo table to facilitate the data sharing among working nodes. pMatlab is deployed as the parallel computation engine. Our proposed architecture is proved to be lighter, easier and faster than MapReduce based approach. One example application is calculating top k eigenvalues and eigenvectors for large sparse matrix. Experiments on Graph500 benchmark datasets demonstrate 2X speedup of our architecture as compared to HEIGEN (An eigensolver for billion-scale matrices using MapReduce).en_US
dc.description.sponsorshipThe authors would like to thank IBM/CAS Toronto for supporting Yin Huang with a CAS fellowship. We would also like to thank NIST/SSD Information Systems Group for providing support to conduct this Big Data Analytics computation. We are also grateful to CHMPR for providing the IBM iDataPlex bluewave computational resources to conduct these data intensive experiments. In particular, we wish to acknowledge Dr. John Dorband for training one of the authors as a system administrator to establish the Hadoop based ecosystem. And we would also like to thank Prof Xian-He Sun from Department of Computer Science at the Illinois Institute of Technology for providing cluster resource.en_US
dc.description.urihttps://ieeexplore.ieee.org/document/7364045en_US
dc.format.extent8 pagesen_US
dc.genreconference papers and proceedings pre-printen_US
dc.identifierdoi:10.13016/M2MG7G04F
dc.identifier.citationYin Huang, Yelena Yesha, and Shujia Zhou, A database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrix, 2015 IEEE International Conference on Big Data (Big Data) , DOI: 10.1109/BigData.2015.7364045en_US
dc.identifier.uri10.1109/BigData.2015.7364045
dc.identifier.urihttp://hdl.handle.net/11603/11809
dc.language.isoen_USen_US
dc.publisherIEEEen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights© 2015 IEEE
dc.subjectBig Data Analyticsen_US
dc.subjectDistributed Computation Architectureen_US
dc.subjectEigendecompositionen_US
dc.subjectApache Accumuloen_US
dc.subjectD4M (Dynamic Distributed Dimensional Data Model)en_US
dc.subjectUMBC Ebiquity Research Groupen_US
dc.titleA database-based distributed computation architecture with Accumulo and D4M: An application of eigensolver for large sparse matrixen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
805.pd.pdf
Size:
806.88 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: