Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

dc.contributor.authorChen, Hao
dc.contributor.authorWu, Yusen
dc.contributor.authorNguyen, Phuong
dc.contributor.authorLiu, Chao
dc.contributor.authorYesha, Yelena
dc.date.accessioned2023-10-13T13:53:10Z
dc.date.available2023-10-13T13:53:10Z
dc.date.issued2023-09-21
dc.description.abstractStochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a soft merging method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the 𝑙₀ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.en_US
dc.description.urihttps://arxiv.org/abs/2309.12259en_US
dc.format.extent5 pagesen_US
dc.genrejournal articlesen_US
dc.genrepreprintsen_US
dc.identifierdoi:10.13016/m2gsrq-myma
dc.identifier.urihttps://doi.org/10.48550/arXiv.2309.12259
dc.identifier.urihttp://hdl.handle.net/11603/30141
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.rightsCC BY-NC-SA 4.0 Deed Attribution-NonCommercial-ShareAlike 4.0 International*
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0/*
dc.titleSoft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performanceen_US
dc.typeTexten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2309.12259.pdf
Size:
543.02 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: