NOLA: Networks as Linear Combination of Low Rank Random Basis

dc.contributor.authorKoohpayegani, Soroush Abbasi
dc.contributor.authorNavaneet, KL
dc.contributor.authorNooralinejad, Parsa
dc.contributor.authorKolouri, Soheil
dc.contributor.authorPirsiavash, Hamed
dc.date.accessioned2023-11-09T18:02:57Z
dc.date.available2023-11-09T18:02:57Z
dc.date.issued2023-10-04
dc.description.abstractLarge Language Models (LLMs) have recently gained popularity due to their impressive few-shot performance across various downstream tasks. However, fine-tuning all parameters and storing a unique model for each downstream task or domain becomes impractical because of the massive size of checkpoints (e.g., 350GB in GPT-3). Current literature, such as LoRA, showcases the potential of low-rank modifications to the original weights of an LLM, enabling efficient adaptation and storage for task-specific models. These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude. Yet, these methods face two primary limitations: 1) the parameter reduction is lower-bounded by the rank one decomposition, and 2) the extent of reduction is heavily influenced by both the model architecture and the chosen rank. For instance, in larger models, even a rank one decomposition might exceed the number of parameters truly needed for adaptation. In this paper, we introduce NOLA, which overcomes the rank one lower bound present in LoRA. It achieves this by re-parameterizing the low-rank matrices in LoRA using linear combinations of randomly generated matrices (basis) and optimizing the linear mixture coefficients only. This approach allows us to decouple the number of trainable parameters from both the choice of rank and the network architecture. We present adaptation results using GPT-2 and ViT in natural language and computer vision tasks. NOLA performs as well as, or better than models with equivalent parameter counts. Furthermore, we demonstrate that we can halve the parameters in larger models compared to LoRA with rank one, without sacrificing performance.en_US
dc.description.sponsorshipThis work was partially supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112190135 and funding from NSF grant 1845216. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.en_US
dc.description.urihttps://arxiv.org/abs/2310.02556en_US
dc.format.extent15 pagesen_US
dc.genrejournal articlesen_US
dc.genrepreprintsen_US
dc.identifierdoi:10.13016/m2mc6e-xjgl
dc.identifier.urihttps://doi.org/10.48550/arXiv.2310.02556
dc.identifier.urihttp://hdl.handle.net/11603/30639
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.titleNOLA: Networks as Linear Combination of Low Rank Random Basisen_US
dc.typeTexten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2310.02556.pdf
Size:
589.88 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: