UMBC College of Engineering and Information Technology

Permanent URI for this community

http://hdl.handle.net/11603/48

Browse

Now showing 1 - 2 of 2

PRANC: Pseudo RAndom Networks for Compacting deep models
(2022-06-16) Nooralinejad, Parsa; Abbasi, Ali; Kolouri, Soheil; Pirsiavash, Hamed
Communication becomes a bottleneck in various distributed Machine Learning settings. Here, we propose a novel training framework that leads to highly efficient communication of models between agents. In short, we train our network to be a linear combination of many pseudo-randomly generated frozen models. For communication, the source agent transmits only the ‘seed’ scalar used to generate the pseudo-random ‘basis’ networks along with the learned linear mixture coefficients. Our method, denoted as PRANC, learns almost 100× fewer parameters than a deep model and still performs well on several datasets and architectures. PRANC enables 1) efficient communication of models between agents, 2) efficient model storage, and 3) accelerated inference by generating layer-wise weights on the fly. We test PRANC on CIFAR-10, CIFAR-100, tinyImageNet, and ImageNet-100 with various architectures like AlexNet, LeNet, ResNet18, ResNet20, and ResNet56 and demonstrate a massive reduction in the number of parameters while providing satisfactory performance on these benchmark datasets. The code is available https://github.com/UCDvision/PRANC
Sparsity and heterogeneous dropout for continual learning in the null space of neural activations
(MLResearchPress, 2022) Abbasi, Ali; Nooralinejad, Parsa; Braverman, Vladimir; Pirsiavash, Hamed; Kolouri, Soheil
Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called “catastrophic forgetting” and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner’s performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce two new benchmarks for continual learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.

Browse

Browsing UMBC College of Engineering and Information Technology by Author "Abbasi, Ali"

Results Per Page

Sort Options