NOLA: Networks as Linear Combination of Low Rank Random Basis

Koohpayegani, Soroush Abbasi; Navaneet, KL; Nooralinejad, Parsa; Kolouri, Soheil; Pirsiavash, Hamed

NOLA: Networks as Linear Combination of Low Rank Random Basis

dc.contributor.author	Koohpayegani, Soroush Abbasi
dc.contributor.author	Navaneet, KL
dc.contributor.author	Nooralinejad, Parsa
dc.contributor.author	Kolouri, Soheil
dc.contributor.author	Pirsiavash, Hamed
dc.date.accessioned	2023-11-09T18:02:57Z
dc.date.available	2023-11-09T18:02:57Z
dc.date.issued	2023-10-04
dc.description.abstract	Large Language Models (LLMs) have recently gained popularity due to their impressive few-shot performance across various downstream tasks. However, fine-tuning all parameters and storing a unique model for each downstream task or domain becomes impractical because of the massive size of checkpoints (e.g., 350GB in GPT-3). Current literature, such as LoRA, showcases the potential of low-rank modifications to the original weights of an LLM, enabling efficient adaptation and storage for task-specific models. These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude. Yet, these methods face two primary limitations: 1) the parameter reduction is lower-bounded by the rank one decomposition, and 2) the extent of reduction is heavily influenced by both the model architecture and the chosen rank. For instance, in larger models, even a rank one decomposition might exceed the number of parameters truly needed for adaptation. In this paper, we introduce NOLA, which overcomes the rank one lower bound present in LoRA. It achieves this by re-parameterizing the low-rank matrices in LoRA using linear combinations of randomly generated matrices (basis) and optimizing the linear mixture coefficients only. This approach allows us to decouple the number of trainable parameters from both the choice of rank and the network architecture. We present adaptation results using GPT-2 and ViT in natural language and computer vision tasks. NOLA performs as well as, or better than models with equivalent parameter counts. Furthermore, we demonstrate that we can halve the parameters in larger models compared to LoRA with rank one, without sacrificing performance.	en_US
dc.description.sponsorship	This work was partially supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112190135 and funding from NSF grant 1845216. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.	en_US
dc.description.uri	https://arxiv.org/abs/2310.02556	en_US
dc.format.extent	15 pages	en_US
dc.genre	journal articles	en_US
dc.genre	preprints	en_US
dc.identifier	doi:10.13016/m2mc6e-xjgl
dc.identifier.uri	https://doi.org/10.48550/arXiv.2310.02556
dc.identifier.uri	http://hdl.handle.net/11603/30639
dc.language.iso	en_US	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.	en_US
dc.title	NOLA: Networks as Linear Combination of Low Rank Random Basis	en_US
dc.type	Text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2310.02556.pdf
Size:: 589.88 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection