Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery

dc.contributor.authorGuo, Pei
dc.contributor.authorHuang, Yiyi
dc.contributor.authorWang, Jianwu
dc.date.accessioned2022-09-26T16:08:53Z
dc.date.available2022-09-26T16:08:53Z
dc.date.issued2021-11-15
dc.description.abstractCausality study investigates cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time-series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with realworld climate data show these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a flexible twophase causality ensemble framework. The framework first conducts phase 1 ensemble for partitioned data and then conducts phase 2 ensemble from phase 1 ensemble results. Based on the framework, we develop two ensemble approaches: i) data ensemble at phase 1 and algorithm ensemble at phase 2, and ii) algorithm ensemble at phase 1 and data ensemble at phase 2. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. The proposed ensemble approaches are evaluated by synthetic and real-world datasets. Our experiments show that the proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environmentsen_US
dc.description.sponsorshipThis work is supported by grant CyberTraining: DSE: CrossTraining of Researchers in Computing, Applied Mathematics and Atmospheric Sciences using Advanced Cyberinfrastructure Resources (OAC–1730250), and grant CAREER: Big Data Climate Causality Analytics (OAC–1942714) from the National Science Foundation. The execution environment is provided through the High Performance Computing Facility at UMBCen_US
dc.description.urihttps://www.sciencedirect.com/science/article/pii/S2214579621000691en_US
dc.format.extent17 pagesen_US
dc.genrejournal articlesen_US
dc.genrecomputer codeen_US
dc.identifierdoi:10.13016/m2irne-gvje
dc.identifier.citationPei Guo, Yiyi Huang, Jianwu Wang. Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery. Big Data Research, vol. 26, no. 100252, November 2021. DOI:10.1016/j.bdr.2021.100252en_US
dc.identifier.urihttps://doi.org/10.1016/j.bdr.2021.100252
dc.identifier.urihttp://hdl.handle.net/11603/25885
dc.language.isoen_USen_US
dc.publisherElsevieren_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subjectUMBC Big Data Analytics Laben_US
dc.subjectUMBC High Performance Computing Facility (HPCF)
dc.titleScalable and Flexible Two-Phase Ensemble Algorithms for Causality Discoveryen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170en_US

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
1-s2.0-S2214579621000691-main.pdf
Size:
918.23 KB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
ensemble_causality_learning-master.zip
Size:
51.6 KB
Format:
Unknown data format
Description:
Code
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: