EFFICIENT METHODS FOR HIGHER-ORDER FACTORIZATIONS: ACCURACY, SCALABILITY, AND GENERALIZATION
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Computer Science and Electrical Engineering
Program
Engineering, Electrical
Citation of Original Publication
Rights
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Distribution Rights granted to UMBC by the author.
Abstract
Matrix and tensor factorizations are methods that decompose datasets into factors: unobserved, estimated variables useful for summarizing a dataset’s latent structure. The simplest datasets are matrices, utilizing methods such as Singular Value Decomposition (SVD) and Independent Component Analysis (ICA). When given sets of matrices or higher-order tensors, matrix-based methods generalize to ``higher-order factorizations’’ that better leverage the data’s multidimensional structure. These factorizations include Independent Vector Analysis (IVA), extending ICA to multiple matrix datasets, and tensor methods such as Higher-Order Singular Value Decomposition (HOSVD), generalizing SVD to higher-order tensors. Despite the powerful capabilities of higher-order factorizations, challenges are well-summarized by the idiom ``there is no free lunch’’. Factorizations inheriting an advantage in one desired aspect, e.g., estimation quality, necessarily do so at the expense of another aspect, e.g., computational efficiency. Creating a useful factorization method can ultimately be seen as a balancing act, finding a good tradeoff between statistical efficiency (estimation quality), computational efficiency, and the method's generalization capabilities. This dissertation studies the development and theory of efficient higher-order factorizations, notably those methods that perform Joint Blind Source Separation (JBSS) such as IVA, but also tensor factorization methods such as HOSVD. Many improvements we provide are computational, yielding efficient methods for large-scale datasets. However, our methods also provide useful theoretical analysis and improvements to statistical efficiencies. We begin by overviewing JBSS, and introduce methods to better perform JBSS. We introduce efficient analytic solutions and efficient methods via utilizing subsets of the data. We provide theoretical analysis via deriving nonidentifiability conditions: conditions for when our JBSS methods fail (otherwise implying when they succeed). We follow by discussing tensor factorization methods, and introduce an efficient coreset-based factorization that can both approximate the HOSVD and perform a form of tensorial feature selection. For all our methods, we validate their capabilities via various simulated datasets and real-world data from functional magnetic resonance imaging (fMRI). We conclude the dissertation with possible future directions that could extend beyond our proposed methods.
