Big data provenance: Challenges, state of the art and opportunities

dc.contributor.authorWang, Jianwu
dc.contributor.authorCrawl, Daniel
dc.contributor.authorPurawat, Shweta
dc.contributor.authorNguyen, Mai
dc.contributor.authorAltintas, Ilkay
dc.date.accessioned2024-02-14T16:21:41Z
dc.date.available2024-02-14T16:21:41Z
dc.date.issued2015-12-28
dc.description2015 IEEE International Conference on Big Data 29 October 2015 - 01 November 2015 Santa Clara, CA, USA
dc.description.abstractAbility to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.
dc.description.sponsorshipThis work is partially supported by NSF DBI 1062565 and 1331615, NIH P41 GM103426 for NBCR and R25GM114821 for BBDTC, and DOE DE-SC0012630 for IPPD.
dc.description.urihttps://ieeexplore.ieee.org/document/7364047
dc.format.extent8 pages
dc.genreconference papers and proceedings
dc.genrepreprints
dc.identifierdoi:10.13016/m2q4w9-011v
dc.identifier.citationJ. Wang, D. Crawl, S. Purawat, M. Nguyen and I. Altintas, "Big data provenance: Challenges, state of the art and opportunities," 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 2015, pp. 2509-2516, doi: 10.1109/BigData.2015.7364047.
dc.identifier.urihttps://doi.org/10.1109/BigData.2015.7364047
dc.identifier.urihttp://hdl.handle.net/11603/31616
dc.language.isoen_US
dc.publisherIEEE
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Center for Accelerated Real Time Analysis
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartofUMBC Center for Real-time Distributed Sensing and Autonomy
dc.relation.ispartofseriesUMBC Center for Real-time Distributed Sensing and Autonomy
dc.rights© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.subjectUMBC Big Data Analytics Lab
dc.titleBig data provenance: Challenges, state of the art and opportunities
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Big_Data_Provenance-2015.pdf
Size:
765.41 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: