Wang, XinGuo, PeiWang, Jianwu2022-09-262022-09-262022-01-13X. Wang, P. Guo and J. Wang, "Large-Scale Causality Discovery Analytics as a Service," 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 3130-3140, doi: 10.1109/BigData52589.2021.9671373. IEEE International Conference on Big Data (Big Data), 15-18 December 2021, Orlando, FL, USAData-driven causality discovery is a common way to understand causal relationships among different components of a system. We study how to achieve scalable data-driven causal- ity discovery on Amazon Web Services (AWS) and Microsoft Azure cloud and propose a causality discovery as a service (CDaaS) framework. With this framework, users can easily re- run previous causality discovery experiments or run causality discovery with different setups (such as new datasets or causality discovery parameters). Our CDaaS leverages Cloud Container Registry service and Virtual Machine service to achieve scal- able causality discovery with different discovery algorithms. We further did extensive experiments and benchmarking of our CDaaS to understand the effects of seven factors (big data engine parameter setting, virtual machine instance number, type, subtype, size, cloud service, cloud provider) and how to best provision cloud resources for our causality discovery service based on certain goals including execution time, budgetary cost and cost-performance ratio. We report our findings from the benchmarking, which can help obtain optimal configurations based on each application’s characteristics. The findings show proper configurations could lead to both faster execution time and less budgetary cost.11 pagesen-US© 2021 IEEE.  Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.UMBC Big Data Analytics LabLarge-Scale Causality Discovery Analytics as a ServiceText