Scalable and Hybrid Ensemble-Based Causality Discovery

dc.contributor.authorGuo, Pei
dc.contributor.authorOfonedu, Achuna
dc.contributor.authorWang, Jianwu
dc.date.accessioned2021-11-03T17:28:01Z
dc.date.available2021-11-03T17:28:01Z
dc.date.issued2020-12-24
dc.description2020 IEEE International Conference on Smart Data Services (SMDS)en
dc.description.abstractCausality discovery mines cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with real-world climate data shows these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a two-phase hybrid causality ensemble framework. The framework first conducts phase 1 data ensemble for partitioned data and then conducts phase 2 algorithm ensemble from data ensemble results. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. Our experiments show that our proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments.en
dc.description.sponsorshipThis work is supported by grant CyberTraining: DSE: Cross-Training of Researchers in Computing, Applied Math- ematics and Atmospheric Sciences using Advanced Cyberinfrastructure Resources (OAC–1730250), and grant CAREER: Big Data Climate Causality Analytics (OAC–1942714) from the National Science Foundation. The execution environment is provided through the High Performance Computing Facility at UMBC.en
dc.description.urihttps://ieeexplore.ieee.org/document/9288491en
dc.format.extent9 pagesen
dc.genrecomputer code
dc.genreconference papers and proceedingsen
dc.genrepostprintsen
dc.identifierdoi:10.13016/m2ujn3-qhxj
dc.identifier.citationGuo, Pei; Ofonedu, Achuna; Wang, Jianwu; Scalable and Hybrid Ensemble-Based Causality Discovery; 2020 IEEE International Conference on Smart Data Services (SMDS), 24 December, 2020; https://doi.org/10.1109/SMDS49396.2020.00016en
dc.identifier.urihttps://doi.org/10.1109/SMDS49396.2020.00016
dc.identifier.urihttp://hdl.handle.net/11603/23197
dc.language.isoenen
dc.publisherIEEEen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rights© 2020 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en
dc.subjectUMBC High Performance Computing Facility (HPCF)
dc.titleScalable and Hybrid Ensemble-Based Causality Discoveryen
dc.typeTexten
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170en

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
10217090.pdf
Size:
385.66 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
ensemble_causality_learning-master.zip
Size:
51.6 KB
Format:
Unknown data format
Description:
Code

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: