Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery
dc.contributor.author | Guo, Pei | |
dc.contributor.author | Huang, Yiyi | |
dc.contributor.author | Wang, Jianwu | |
dc.date.accessioned | 2022-09-26T16:08:53Z | |
dc.date.available | 2022-09-26T16:08:53Z | |
dc.date.issued | 2021-11-15 | |
dc.description.abstract | Causality study investigates cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time-series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with realworld climate data show these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a flexible twophase causality ensemble framework. The framework first conducts phase 1 ensemble for partitioned data and then conducts phase 2 ensemble from phase 1 ensemble results. Based on the framework, we develop two ensemble approaches: i) data ensemble at phase 1 and algorithm ensemble at phase 2, and ii) algorithm ensemble at phase 1 and data ensemble at phase 2. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. The proposed ensemble approaches are evaluated by synthetic and real-world datasets. Our experiments show that the proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments | en_US |
dc.description.sponsorship | This work is supported by grant CyberTraining: DSE: CrossTraining of Researchers in Computing, Applied Mathematics and Atmospheric Sciences using Advanced Cyberinfrastructure Resources (OAC–1730250), and grant CAREER: Big Data Climate Causality Analytics (OAC–1942714) from the National Science Foundation. The execution environment is provided through the High Performance Computing Facility at UMBC | en_US |
dc.description.uri | https://www.sciencedirect.com/science/article/pii/S2214579621000691 | en_US |
dc.format.extent | 17 pages | en_US |
dc.genre | journal articles | en_US |
dc.genre | computer code | en_US |
dc.identifier | doi:10.13016/m2irne-gvje | |
dc.identifier.citation | Pei Guo, Yiyi Huang, Jianwu Wang. Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery. Big Data Research, vol. 26, no. 100252, November 2021. DOI:10.1016/j.bdr.2021.100252 | en_US |
dc.identifier.uri | https://doi.org/10.1016/j.bdr.2021.100252 | |
dc.identifier.uri | http://hdl.handle.net/11603/25885 | |
dc.language.iso | en_US | en_US |
dc.publisher | Elsevier | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Information Systems Department Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | en_US |
dc.rights | Attribution 4.0 International (CC BY 4.0) | * |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | * |
dc.subject | UMBC Big Data Analytics Lab | en_US |
dc.subject | UMBC High Performance Computing Facility (HPCF) | |
dc.title | Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery | en_US |
dc.type | Text | en_US |
dcterms.creator | https://orcid.org/0000-0002-9933-1170 | en_US |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description: