MaGrIP: Magnitude and Gradient-Informed Pruning for Task-Agnostic Large Language Models

Kallakuri, Uttej; Humes, Edward; Rashid, Hasib-Al; Mohsenin, Tinoosh

MaGrIP: Magnitude and Gradient-Informed Pruning for Task-Agnostic Large Language Models

dc.contributor.author	Kallakuri, Uttej
dc.contributor.author	Humes, Edward
dc.contributor.author	Rashid, Hasib-Al
dc.contributor.author	Mohsenin, Tinoosh
dc.date.accessioned	2025-10-22T19:58:03Z
dc.date.issued	2025-09-05
dc.description.abstract	Large Language Models (LLMs) have become foundational tools in natural language processing, achieving state-of-the-art performance across a variety of tasks. However, their immense size and computational requirements make them impractical for deployment in resource-constrained environments, such as edge devices and embedded systems. In this work, we introduce Magnitude and Gradient-Informed Pruning (MaGrIP), a novel framework for task-agnostic pruning and compression of LLMs. MaGrIP employs a dual-threshold strategy combining magnitude- and gradient-based saliency measures to efficiently prune redundant neurons while retaining task performance. Our results demonstrate the effectiveness of MaGrIP in compressing state-of-the-art models. The compression reduced the total computational complexity of the FFN layers from O (d · ℎ) to O ( (d − q) · ℎ) . In terms of model size, our pruning approach significantly reduces both model parameters and storage requirements while maintaining competitive perplexity scores evaluated on WikiText-2. For the Gemma 7B model, our method reduces the total size from 28 GB to 5 GB, while for Gemma 2B, MaGrIP achieves a size reduction from 8 GB to 1.5 GB. MaGrIP furthermore exhibits robust performance across multiple benchmarks, such as BOOLQ, ARC-E, and CSQA. Specifically, the pruned Gemma 7B model at 50% pruning achieved 59.26% accuracy on ARC-E compared to 81.06% for the baseline, and 64.74% accuracy on BoolQ compared to 59.98% for the baseline. Similarly, the pruned Llama 3 8B at 50% pruning achieved 46.76% accuracy on ARC-E compared to 77.57% for the baseline, reflecting the trade-off between compression and accuracy. LLMs compressed using MaGrIP, when deployed on the Nvidia Jetson Orin Nano, achieved a 2.16 × improvement in throughput and a 2.3 × improvement in performance compared to baseline LLMs.
dc.description.sponsorship	Research was partly sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-24-2-0080. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the oicial policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
dc.description.uri	https://dl.acm.org/doi/10.1145/3766068
dc.format.extent	32 pages
dc.genre	journal articles
dc.identifier	doi:10.13016/m2ptek-yfpt
dc.identifier.citation	Kallakuri, Uttej, Edward Humes, Hasib-Al Rashid, and Tinoosh Mohsenin. “MaGrIP: Magnitude and Gradient-Informed Pruning for Task-Agnostic Large Language Models.” ACM Trans. Embed. Comput. Syst., September 5, 2025. https://doi.org/10.1145/3766068.
dc.identifier.uri	https://doi.org/10.1145/3766068
dc.identifier.uri	http://hdl.handle.net/11603/40538
dc.language.iso	en
dc.publisher	ACM
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Student Collection
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	MaGrIP: Magnitude and Gradient-Informed Pruning for Task-Agnostic Large Language Models
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-9983-6929

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MaGrIP.pdf
Size:: 4.57 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Student Collection
UMBC Computer Science and Electrical Engineering Department