A Streaming Tensor Decomposition Analysis for Earth Science Informatics

Author/Creator

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Subjects

Abstract

Today, many application domains from science, sports, social media, health give rise to streaming multi-way data that can be naturally represented and analyzed via tensors. Time sequences produced from measurements at different locations, made from a variety of platforms, over a wide range of events, lend themselves to important Tensor decomposition (TD) applications. TD is any scheme for expressing a "data tensor" (Mway array or M-nodes) as a sequence of elementary outer product operations acting on other, often simpler tensors. TDs have applications in data analysis, signal processing, machine learning and data mining. We apply the Shaden Smith (SPLATT) algorithm, to form a tensor decomposition that provides a leading coefficient for each outer product term. The value of the leading coefficients for each outer product are analogous to the eigenvalues in the singular value decomposition of any matrix. We computed TD for the FROSTT streaming data test modules, streaming aerosol concentration profiling from a network of ceilometers, weather research forecast model (WRF) output analysis and 40 years of hourly climate observation analysis. We applied TD to a data set of aerosol concentration used to predict air quality. We implemented a multi-sensor ground-based observatory network consisting of three lidar x firing ceilometers distributed along a 650 km corridor along the east coast, which provided near real-time, streaming of high-resolution aerosol concentration profiles from the ground up to 15 km for a one-year period. Daily variations of the aerosol concentration are used to determine the planetary boundary layer heights (PBLH). We determined the Planetary Boundary Layer Height (PBLH) acquired from our observation network in near real time and over 1-year. We applied TD to the WRF model simulated outputs over the entire continental US to study the time dependence of the dominant components of PBLH. Results obtained by TD were compared with the ceilometer observations as an accuracy assessment of model generated PBLH. A second application of TD was applied to ERA5, a global reanalysis of 40 years of atmospheric and model data, that enabled the study of the dominant components of the PBLH on a planetary scale. We further applied TD to global warming data over the entire 40 years. Finally, we examine the power spectral distribution of the leading coefficients associated with the maximum tensor rank of the PBLH and the surface wind speeds for any similarities to fluid turbulence power laws.