OPTIMIZATION ALGORITHMS FOR TRAINING DEEP NEURAL NETWORKS
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2021-01-01
Type of Work
Department
Mathematics and Statistics
Program
Mathematics, Applied
Citation of Original Publication
Rights
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
Abstract
A formal representation of a deep neural network is constructed, andit is demonstrated that networks satisfying the representation can be trained via
feed forward back propagation efficiently. Analysis of the formal representation
proves that optimization algorithms cannot have a computational complexity of
less than O( | E | ) due to the dependence on back propagation. To ground the work
in practice, a comparison is made of the popular optimization algorithms in use
for training deep neural networks. The commonalities of the current algorithms
provide a list of features to use and avoid when developing new deep learning
optimization algorithms. Finally, two new optimization algorithms are developed.
The first is linearized stochastic gradient descent (LSGD) which is a predictor-corrector method. Testing shows that LSGD achieves comparable or superior
quality of fit to SGD, but with quicker and more stable initial training. The second
is approximate stabilized Hessian gradient descent (ASHgrad) which is a quasiNewton method. ASHgrad finds high quality critical points and trains rapidly, but
is slow to compute due to limitations in the current machine learning frameworks.