Lifelong Multitask Learning Algorithms: Nonlinearity, Scalability and Applications

Mowakeaa, RamiLifelong Multitask Learning Algorithms: Nonlinearity, Scalability and ApplicationsMy University2022Kernel methodsLifelong learningMachine learningMultitask learningOptimizationStatistical signal processingMy UniversityMy UniversityKim, Seung-JunAdali, Tulay2023-04-052023-04-052022-01-01Text12667http://hdl.handle.net/11603/27349application:pdfWith the deluge of data in the modern world, machine learning has become a mainstay of industry and science. Machine learning algorithms aim to find underlying relationships hidden in a task that agree with prior knowledge and best generalize to unseen data making as few assumptions about the nature of the data as possible. In cases where several tasks are expected to share underlying structure, multitask learning aims to identify and exploit this shared structure to benefit all tasks. In functional magnetic resonance imaging (fMRI), for example, datasets collected from a group of subjects can be separated into statistically independent sources by joint processing to yield improved performance over what can be achieved considering each dataset alone. In other applications, such as recommender systems, tasks and the corresponding datasets may arrive sequentially over time. This introduces further challenges associated with online processing and modeling of the joint structure in an efficient manner�key issues toward lifelong learning. Furthermore, where linear models are insufficient to adequately describe underlying task relations, it is desirable to capitalize on rich nonlinear function spaces to develop machine learning algorithms. In this work, multitask learning is developed and applied to various machine learning problems. In unsupervised statistical learning, an independent vector analysis algorithm is developed based on a flexible, yet simple, family of distributions termed the complex-valued multivariate generalized Gaussian distribution. In lifelong supervised and reinforcement learning, kernel dictionary learning is used to capture the joint structure of streaming tasks in rich reproducing kernel Hilbert spaces. A sparsification technique is utilized to mitigate the effects of growing computational and storage complexity typical of kernel methods without sacrificing convergence guarantees. This approach is further generalized to tasks where the data emanate from different sources, or views, yielding a kernel lifelong multitask multiview learning algorithm. To affirm the effectiveness of the algorithms developed, experiments based on synthetic as well as real-world datasets are performed. The convergence of the developed supervised lifelong learning algorithms is also rigorously established.