A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning

dc.contributor.authorWang, Jianwu
dc.contributor.authorTang, Yan
dc.contributor.authorNguyen, Mai
dc.contributor.authorAltintas, Ilkay
dc.date.accessioned2024-02-14T17:05:03Z
dc.date.available2024-02-14T17:05:03Z
dc.date.issued2015-11-09
dc.description2014 IEEE/ACM International Symposium on Big Data Computing 8-11 Dec. 2014
dc.description.abstractIn the Big Data era, machine learning has more potential to discover valuable insights from the data. As an important machine learning technique, Bayesian Network (BN) has been widely used to model probabilistic relationships among variables. To deal with the challenges of Big Data PN learning, we apply the techniques in distributed data-parallelism (DDP) and scientific workflow to the BN learning process. We first propose an intelligent Big Data pre-processing approach and a data quality score to measure and ensure the data quality and data faithfulness. Then, a new weight based ensemble algorithm is proposed to learn a BN structure from an ensemble of local results. To easily integrate the algorithm with DDP engines, such as Hadoop, we employ Kepler scientific workflow to build the whole learning process. We demonstrate how Kepler can facilitate building and running our Big Data BN learning application. Our experiments show good scalability and learning accuracy when running the application in real distributed environments.
dc.description.sponsorshipThis work is supported by the Natural Science Foundation of Jiangsu Province, China under grant No. BK20140857 and National Science Foundation, U.S. under grant DBI-1062565 and 1331615.
dc.description.urihttps://ieeexplore.ieee.org/document/7321725
dc.format.extent10 pages
dc.genreconference papers and proceedings
dc.genrepreprints
dc.identifierdoi:10.13016/m2crq8-rygm
dc.identifier.citationJ. Wang, Y. Tang, M. Nguyen and I. Altintas, "A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning," 2014 IEEE/ACM International Symposium on Big Data Computing, London, UK, 2014, pp. 16-25, doi: 10.1109/BDC.2014.10.
dc.identifier.urihttps://doi.org/10.1109/BDC.2014.10
dc.identifier.urihttp://hdl.handle.net/11603/31618
dc.language.isoen_US
dc.publisherIEEE
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Center for Accelerated Real Time Analysis
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartofUMBC Center for Real-time Distributed Sensing and Autono
dc.rights© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.subjectUMBC Big Data Analytics Lab
dc.titleA Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A_Scalable_Data_Science_Workflow_Approach_for_Big_Data_Bayesian_Network_Learning.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: