Understanding Time series data clustering and correlation through visualization

Author/Creator

Author/Creator ORCID

Date

2018-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Abstract

Time series is an essential and ubiquitous source of data with its applications in stock markets, digital signal processing, weather forecasting, census analysis, and health monitoring data. This form of data is generated continuously and in massive amounts. To make sense of such deluges of overlapping temporal data, we employ clustering algorithms which reduce the clutter by aggregating similar shaped/behaving data into their clustered versions. In order to view if the clustering is effective and the clusters produced are tight, we need a visualization technique for displaying time series clusters and the underlying data it represents. There has been previous work done in creating visualizations to represent time series data such as line charts, Gantt charts, stream charts, and heat maps. These visualization techniques are useful in representing a single or multiple clustered data point on a temporal scale, but none of them can represent the distribution of values within each of the clusters. As a result, this shortcoming calls for new visualization techniques that combine temporal representation techniques with statistical representation techniques. This visualization aims to help users visualize overall clusters in the time series data and identify interesting trends and patterns in them. In addition to viewing the temporal characteristics of such clusters, the visualization should also represent information about the distribution of data within a cluster. The proposed visualization achieves this by representing various statistical aspects in the form of box plots superimposed upon line charts. The proposed visualization method helps users understand if there exists a correlation between two different time series data occurring in the same time domain. This feature can help users explore the causality and periodicity relationships between the two different time series data. In this research, we demonstrate the results of using this visualization method for finding the clustering and correlation within temperature and pressure time series data for 18 cities. We also discuss an application of this visualization in understanding the effectiveness of help-seeking behavior on student grades. This application would allow users to correlate office hours attendance with a student's performance in a course. This research contributes towards a better understanding of the properties and quality of different time series clustering algorithms through a visual representation of a cluster distribution. It also introduces a novel approach in visualizing the correlation between two simultaneously occurring time series.