A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning
dc.contributor.author | Wang, Jianwu | |
dc.contributor.author | Tang, Yan | |
dc.contributor.author | Nguyen, Mai | |
dc.contributor.author | Altintas, Ilkay | |
dc.date.accessioned | 2024-02-14T17:05:03Z | |
dc.date.available | 2024-02-14T17:05:03Z | |
dc.date.issued | 2015-11-09 | |
dc.description | 2014 IEEE/ACM International Symposium on Big Data Computing 8-11 Dec. 2014 | |
dc.description.abstract | In the Big Data era, machine learning has more potential to discover valuable insights from the data. As an important machine learning technique, Bayesian Network (BN) has been widely used to model probabilistic relationships among variables. To deal with the challenges of Big Data PN learning, we apply the techniques in distributed data-parallelism (DDP) and scientific workflow to the BN learning process. We first propose an intelligent Big Data pre-processing approach and a data quality score to measure and ensure the data quality and data faithfulness. Then, a new weight based ensemble algorithm is proposed to learn a BN structure from an ensemble of local results. To easily integrate the algorithm with DDP engines, such as Hadoop, we employ Kepler scientific workflow to build the whole learning process. We demonstrate how Kepler can facilitate building and running our Big Data BN learning application. Our experiments show good scalability and learning accuracy when running the application in real distributed environments. | |
dc.description.sponsorship | This work is supported by the Natural Science Foundation of Jiangsu Province, China under grant No. BK20140857 and National Science Foundation, U.S. under grant DBI-1062565 and 1331615. | |
dc.description.uri | https://ieeexplore.ieee.org/document/7321725 | |
dc.format.extent | 10 pages | |
dc.genre | conference papers and proceedings | |
dc.genre | preprints | |
dc.identifier | doi:10.13016/m2crq8-rygm | |
dc.identifier.citation | J. Wang, Y. Tang, M. Nguyen and I. Altintas, "A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning," 2014 IEEE/ACM International Symposium on Big Data Computing, London, UK, 2014, pp. 16-25, doi: 10.1109/BDC.2014.10. | |
dc.identifier.uri | https://doi.org/10.1109/BDC.2014.10 | |
dc.identifier.uri | http://hdl.handle.net/11603/31618 | |
dc.language.iso | en_US | |
dc.publisher | IEEE | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Information Systems Department Collection | |
dc.relation.ispartof | UMBC Center for Accelerated Real Time Analysis | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
dc.relation.ispartof | UMBC Data Science | |
dc.relation.ispartof | UMBC Joint Center for Earth Systems Technology (JCET) | |
dc.relation.ispartof | UMBC Center for Real-time Distributed Sensing and Autono | |
dc.rights | © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | |
dc.subject | UMBC Big Data Analytics Lab | |
dc.title | A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning | |
dc.type | Text | |
dcterms.creator | https://orcid.org/0000-0002-9933-1170 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- A_Scalable_Data_Science_Workflow_Approach_for_Big_Data_Bayesian_Network_Learning.pdf
- Size:
- 1.06 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description: