Lifelong Multitask Learning Algorithms: Nonlinearity, Scalability and Applications

Author/Creator

Author/Creator ORCID

Date

2022-01-01

Department

Computer Science and Electrical Engineering

Program

Engineering, Electrical

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

With the deluge of data in the modern world, machine learning has become a mainstay of industry and science. Machine learning algorithms aim to find underlying relationships hidden in a task that agree with prior knowledge and best generalize to unseen data making as few assumptions about the nature of the data as possible. In cases where several tasks are expected to share underlying structure, multitask learning aims to identify and exploit this shared structure to benefit all tasks. In functional magnetic resonance imaging (fMRI), for example, datasets collected from a group of subjects can be separated into statistically independent sources by joint processing to yield improved performance over what can be achieved considering each dataset alone. In other applications, such as recommender systems, tasks and the corresponding datasets may arrive sequentially over time. This introduces further challenges associated with online processing and modeling of the joint structure in an efficient manner�key issues toward lifelong learning. Furthermore, where linear models are insufficient to adequately describe underlying task relations, it is desirable to capitalize on rich nonlinear function spaces to develop machine learning algorithms. In this work, multitask learning is developed and applied to various machine learning problems. In unsupervised statistical learning, an independent vector analysis algorithm is developed based on a flexible, yet simple, family of distributions termed the complex-valued multivariate generalized Gaussian distribution. In lifelong supervised and reinforcement learning, kernel dictionary learning is used to capture the joint structure of streaming tasks in rich reproducing kernel Hilbert spaces. A sparsification technique is utilized to mitigate the effects of growing computational and storage complexity typical of kernel methods without sacrificing convergence guarantees. This approach is further generalized to tasks where the data emanate from different sources, or views, yielding a kernel lifelong multitask multiview learning algorithm. To affirm the effectiveness of the algorithms developed, experiments based on synthetic as well as real-world datasets are performed. The convergence of the developed supervised lifelong learning algorithms is also rigorously established.