Large-Scale Causality Discovery Analytics as a Service
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2022-01-13
Type of Work
Department
Program
Citation of Original Publication
X. Wang, P. Guo and J. Wang, "Large-Scale Causality Discovery Analytics as a Service," 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 3130-3140, doi: 10.1109/BigData52589.2021.9671373.
Rights
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects
Abstract
Data-driven causality discovery is a common way to
understand causal relationships among different components of
a system. We study how to achieve scalable data-driven causal-
ity discovery on Amazon Web Services (AWS) and Microsoft
Azure cloud and propose a causality discovery as a service
(CDaaS) framework. With this framework, users can easily re-
run previous causality discovery experiments or run causality
discovery with different setups (such as new datasets or causality
discovery parameters). Our CDaaS leverages Cloud Container
Registry service and Virtual Machine service to achieve scal-
able causality discovery with different discovery algorithms. We
further did extensive experiments and benchmarking of our
CDaaS to understand the effects of seven factors (big data
engine parameter setting, virtual machine instance number, type,
subtype, size, cloud service, cloud provider) and how to best
provision cloud resources for our causality discovery service
based on certain goals including execution time, budgetary cost
and cost-performance ratio. We report our findings from the
benchmarking, which can help obtain optimal configurations
based on each application’s characteristics. The findings show
proper configurations could lead to both faster execution time
and less budgetary cost.