Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper

dc.contributor.authorAltintas, Ilkay
dc.contributor.authorWang, Jianwu
dc.contributor.authorCrawl, Daniel
dc.contributor.authorLi, Weizhong
dc.date.accessioned2024-02-19T15:02:50Z
dc.date.available2024-02-19T15:02:50Z
dc.date.issued2012-03-30
dc.descriptionICDT '12: 15th International Conference on Database Theory Berlin Germany 30 March 2012
dc.description.abstractNext-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges and placing unprecedented demands on traditional single-processor bioinformatics algorithms. Middleware and technologies for scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific workflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This vision paper discusses the challenges related to next-generation sequencing data, explains the approaches taken in bioKepler to help with analysis of such data, and presents preliminary results demonstrating these approaches.
dc.description.sponsorshipThe authors would like to thank the rest of Kepler and CAMERA teams for their collaboration. This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE, NSF ABI Award DBI-1062565 for bioKepler, the Gordon and Betty Moore Foundation award to Calit2 at UCSD for CAMERA, and an SDSC Triton Research Opportunities grant.
dc.description.urihttps://dl.acm.org/doi/10.1145/2320765.2320791
dc.format.extent6 pages
dc.genreconference papers and proceedings
dc.identifierdoi:10.13016/m2x94i-x9d6
dc.identifier.citationAltintas, Ilkay, Jianwu Wang, Daniel Crawl, and Weizhong Li. “Challenges and Approaches for Distributed Workflow-Driven Analysis of Large-Scale Biological Data: Vision Paper.” In Proceedings of the 2012 Joint EDBT/ICDT Workshops, 73–78. EDBT-ICDT ’12. New York, NY, USA: Association for Computing Machinery, 2012. https://doi.org/10.1145/2320765.2320791.
dc.identifier.urihttps://doi.org/10.1145/2320765.2320791
dc.identifier.urihttp://hdl.handle.net/11603/31654
dc.language.isoen_US
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Center for Accelerated Real Time Analysis
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartofUMBC Center for Real-time Distributed Sensing and Autonomy
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectscientific workflows
dc.subjectdata-parallel patterns
dc.subjectbioinformatics
dc.subjectnext- generation sequence analysis
dc.subjectUMBC Big Data Analytics Lab
dc.titleChallenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: