Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
| dc.contributor.author | Chen, Hao | |
| dc.contributor.author | Wu, Yusen | |
| dc.contributor.author | Nguyen, Phuong | |
| dc.contributor.author | Liu, Chao | |
| dc.contributor.author | Yesha, Yelena | |
| dc.date.accessioned | 2023-10-13T13:53:10Z | |
| dc.date.available | 2023-10-13T13:53:10Z | |
| dc.date.issued | 2023-09-21 | |
| dc.description.abstract | Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a soft merging method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the 𝑙₀ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks. | en_US |
| dc.description.uri | https://arxiv.org/abs/2309.12259 | en_US |
| dc.format.extent | 5 pages | en_US |
| dc.genre | journal articles | en_US |
| dc.genre | preprints | en_US |
| dc.identifier | doi:10.13016/m2gsrq-myma | |
| dc.identifier.uri | https://doi.org/10.48550/arXiv.2309.12259 | |
| dc.identifier.uri | http://hdl.handle.net/11603/30141 | |
| dc.language.iso | en_US | en_US |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | en_US |
| dc.rights | CC BY-NC-SA 4.0 Deed Attribution-NonCommercial-ShareAlike 4.0 International | * |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ | * |
| dc.title | Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance | en_US |
| dc.type | Text | en_US |
