Comparison of Distributed Data-Parallelization Patterns for Big Data Analysis: A Bioinformatics Case Study
| dc.contributor.author | Wang, Jianwu | |
| dc.contributor.author | Crawl, Daniel | |
| dc.contributor.author | Altintas, Ilkay | |
| dc.contributor.author | Tzoumas, Kostas | |
| dc.contributor.author | Markl, Volker | |
| dc.date.accessioned | 2024-02-14T20:10:23Z | |
| dc.date.available | 2024-02-14T20:10:23Z | |
| dc.date.issued | 2013 | |
| dc.description | DataCloud’13, Nov. 17, 2013, Denver, CO, U.S.A | |
| dc.description.abstract | As a distributed data-parallelization (DDP) pattern, MapReduce has been adopted by many new big data analysis tools to achieve good scalability and performance in Cluster or Cloud environments. This paper explores how two binary DDP patterns, i.e., CoGroup and Match, could also be used in these tools. We reimplemented an existing bioinformatics tool,called CloudBurst, with three different DDP pattern combinations. We identify two factors, namely, input data balancing and value sparseness, which could greatly affect the performances using different DDP patterns. Our experiments show: (i) a simple DDP pattern switch could speed up performance by almost two times; (ii) the identified factors can explain the differences well. | |
| dc.description.sponsorship | This work was supported by NSF ABI Award DBI-1062565 for bioKepler. The authors would like to thank the rest of bioKepler and Stratosphere teams for their collaboration. We also thank the FutureGrid project for experiment environment support. | |
| dc.description.uri | https://users.sdsc.edu/~jianwu/JianwuWang_files/Comparison_of_Distributed_Data-Parallelization_Patterns_for_Big_Data_Analysis_A_Bioinformatics_Case_Study(2013).pdf | |
| dc.format.extent | 5 pages | |
| dc.genre | conference papers and proceedings | |
| dc.genre | presentations (communicative events) | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m217bs-e5vh | |
| dc.identifier.uri | http://hdl.handle.net/11603/31627 | |
| dc.language.iso | en_US | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Information Systems Department Collection | |
| dc.relation.ispartof | UMBC Center for Accelerated Real Time Analysis | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.relation.ispartof | UMBC Data Science | |
| dc.relation.ispartof | UMBC Joint Center for Earth Systems Technology (JCET) | |
| dc.relation.ispartof | UMBC Center for Real-time Distributed Sensing and Autonomy | |
| dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
| dc.subject | UMBC Big Data Analytics Lab | |
| dc.title | Comparison of Distributed Data-Parallelization Patterns for Big Data Analysis: A Bioinformatics Case Study | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-9933-1170 |
Files
Original bundle
1 - 2 of 2
Loading...
- Name:
- Comparison_of_Distributed_Data-Parallelization_Patterns_for_Big_Data_Analysis_A_Bioinformatics_Case_Study(2013).pdf
- Size:
- 324.49 KB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- DDP-Comparison-Jianwu_Wang-UCSD_Slides.pdf
- Size:
- 224.32 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
