Evaluation of Traditional and Deep Clustering Algorithms for Multivariate Spatio-Temporal Data

Department

Program

Citation of Original Publication

Nji, Ndikum Francis, Rohan Mandar Salvi, Sai Sri Ram Kuram Tirumala, Jianwu Wang, and Xue Zheng. “Evaluation of Traditional and Deep Clustering Algorithms for Multivariate Spatio-Temporal Data.” Lawrence Livermore National Laboratory, October 28, 2024.

Rights

This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain

Abstract

Spatiotemporal data is commonly available in many disciplines such as atmospheric science, Earth sciences and environment science, and data is generated by monitoring a certain area over a period of time. Analyzing such high-dimensional data is critical for uncovering hidden patterns and one important approach is to categorize it along the temporal dimension into smaller groups. While classical methods like K-means and Gaussian Mixture Models (GMM) are favored for their simplicity and interpretability, they encounter challenges in modeling complex, high-dimensional relationships inherent in nonlinear spatiotemporal data. In contrast, deep clustering algorithms that combine neural networks with unsupervised learning objectives excel by learning latent representations that better capture nonlinear spatiotemporal dependencies. This study provides a rigorous evaluation of both traditional and deep clustering algorithms on high dimensional multivariate spatiotemporal climate datasets. Our comparative study examines the performance of these techniques across synthetic and real-world datasets, assessing clustering accuracy and stability. We emphasize the advantages of deep clustering, particularly in applications such as climate data analysis and traffic flow prediction, where mining and understanding nonlinear high-dimensional correlations are critical. The results demonstrate that while traditional clustering algorithms are effective for basic tasks, deep learning-based approaches outperform them in managing complex nonlinear patterns present in high dimensional multivariate spatiotemporal data.