Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

dc.contributor.authorWang, Jianwu
dc.contributor.authorCrawl, Daniel
dc.contributor.authorAltintas, Ilkay
dc.date.accessioned2024-02-19T15:30:33Z
dc.date.available2024-02-19T15:30:33Z
dc.date.issued2009-11-16
dc.descriptionSC '09: International Conference for High Performance Computing, Networking, Storage and Analysis Portland Oregon 16 November 2009
dc.description.abstractMapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop, support parallel processing on large datasets with capabilities including automatic data partitioning and distribution, load balancing, and fault tolerance management. Meanwhile, scientific workflow management systems, e.g., Kepler, Taverna, Triana, and Pegasus, have demonstrated their ability to help domain scientists solve scientific problems by synthesizing different data and computing resources. By integrating Hadoop with Kepler, we provide an easy-to-use architecture that facilitates users to compose and execute MapReduce applications in Kepler scientific workflows. Our implementation demonstrates that many characteristics of scientific workflow management systems, e.g., graphical user interface and component reuse and sharing, are very complementary to those of MapReduce. Using the presented Hadoop components in Kepler, scientists can easily utilize MapReduce in their domain-specific problems and connect them with other tasks in a workflow through the Kepler graphical user interface. We validate the feasibility of our approach via a word count use case.
dc.description.sponsorshipThe authors would like to thank the rest of the Kepler team for their collaboration, and Daniel Zinn for his feedback. This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE, NSF CEO:P Award No. DBI 0619060 for REAP, DOE SciDac Award No. DE-FC02-07ER25811 for SDM Center, and UCGRID Project
dc.description.urihttps://dl.acm.org/doi/abs/10.1145/1645164.1645176
dc.format.extent8 pages
dc.genreconference paper and proceedings preprints presentation (communicative events)
dc.identifierdoi:10.13016/m2joeu-42lu
dc.identifier.citationWang, Jianwu, Daniel Crawl, and Ilkay Altintas. “Kepler + Hadoop: A General Architecture Facilitating Data-Intensive Applications in Scientific Workflow Systems.” In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, 1–8. WORKS ’09. New York, NY, USA: Association for Computing Machinery, 2009. https://doi.org/10.1145/1645164.1645176.
dc.identifier.urihttps://doi.org/10.1145/1645164.1645176
dc.identifier.urihttp://hdl.handle.net/11603/31656
dc.language.isoen_US
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Center for Accelerated Real Time Analysis
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartofUMBC Center for Real-time Distributed Sensing and Autonomy
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectMapReduce
dc.subjectKepler
dc.subjectHadoop
dc.subjectscientific workflow
dc.subjectparallel computing
dc.subjectdistributed computing
dc.subjectUMBC Big Data Analytics Lab
dc.titleKepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hadoop + kepler (WORKS at SC2009).pdf
Size:
491.29 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: