Adaptive and Efficient Streaming Time Series Forecasting with Lambda Architecture and Spark
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2021-03-19
Department
Program
Citation of Original Publication
A. Pandya, O. Odunsi, C. Liu, A. Cuzzocrea and J. Wang, "Adaptive and Efficient Streaming Time Series Forecasting with Lambda Architecture and Spark," 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 5182-5190, doi: 10.1109/BigData50022.2020.9377947.
Rights
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract
The rise of the Internet of Things (IoT) devices
and the streaming platform has tremendously increased the
data in motion or streaming data. It incorporates a wide variety
of data, for example, social media posts, online gamers in-game
activities, mobile or web application logs, online e-commerce
transactions, financial trading, or geospatial services. Accurate
and efficient forecasting based on real-time data is a critical part
of the operation in areas like energy & utility consumption,
healthcare, industrial production, supply chain, weather
forecasting, financial trading, agriculture, etc. Statistical time
series forecasting methods like Autoregression (AR),
Autoregressive integrated moving average (ARIMA), and
Vector Autoregression (VAR), face the challenge of concept
drift in the streaming data, i.e., the properties of the stream may
change over time. Another challenge is the efficiency of the
system to update the Machine Learning (ML) models which are
based on these algorithms to tackle the concept drift. In this
paper, we propose a novel framework to tackle both of these
challenges. The challenge of adaptability is addressed by
applying the Lambda architecture to forecast future state based
on three approaches simultaneously: batch (historic) data-based
prediction, streaming (real-time) data-based prediction, and
hybrid prediction by combining the first two. To address the
challenge of efficiency, we implement a distributed VAR
algorithm on top of the Apache Spark big data platform. To
evaluate our framework, we conducted experiments on
streaming time series forecasting with four types of data sets of
experiments: data without drift (no drift), data with gradual
drift, data with abrupt drift and data with mixed drift. The
experiments show the differences of our three forecasting
approaches in terms of accuracy and adaptability.