Sparsity and heterogeneous dropout for continual learning in the null space of neural activations
Loading...
Links to Files
Permanent Link
Author/Creator ORCID
Date
2022
Type of Work
Department
Program
Citation of Original Publication
Abbasi, Ali, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, and Soheil Kolouri. “Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations.” In Proceedings of The 1st Conference on Lifelong Learning Agents, 617–28. PMLR, 2022. https://proceedings.mlr.press/v199/abbasi22a.html.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence.
Despite their phenomenal performance in a wide variety of applications, deep neural networks are
prone to forgetting their previously learned information upon learning new ones. This phenomenon is
called “catastrophic forgetting” and is deeply rooted in the stability-plasticity dilemma. Overcoming
catastrophic forgetting in deep neural networks has become an active field of research in recent years. In
particular, gradient projection-based methods have recently shown exceptional performance at overcoming
catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and
heterogeneous dropout that significantly increase a continual learner’s performance over a long sequence of
tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage
k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task,
together with a between-task heterogeneous dropout that encourages the network to use non-overlapping
activation patterns between different tasks. In addition, we introduce two new benchmarks for continual
learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we
provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on
various benchmark continual learning problems.