Convexification and Deconvexification for Training Artificial Neural Networks

Author/Creator

Author/Creator ORCID

Date

2016-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Subjects

Abstract

The purpose of this dissertations research is to overcome a fundamental problem in the theory and application of artificial neural networks (ANNs). The problem, called the local minimum problem in training ANNs, has plagued the ANN community since the middle of 1980s. ANNs trained with backpropagation are extensively utilized to solve various tasks in artificial intelligence fields for decades. The computing power of ANNs is derived through its particularly distributed structure together with the capability to learn and to generalize. However, application and further development of ANNs have been impeded by the local minimum problem and attracted much attention for a very long time. A primary difficulty of solving the local minimum problem lies in the intrinsic non- convexity of training criteria of ANNs, which usually contain a large number of non-global local minima in their weight spaces. Although an enormous amount of solutions have been developed to optimize the free parameters of the objective function for consistently achieving a better optimum, these methods or algorithms are unable to solve the local minimum problem essentially with the intricate presence of the non-convex function. To alleviate the fundamental difficulty of the local minimum problem in training ANNs, this dissertations proposes a series of methodologies by applying convexification and deconvexification to avoid non-global local minima and achieve global or near-global min- ima with satisfactory optimization and generalization performances. These methodologies are developed based on a normalized risk-averting error (NRAE) criterion. The use of this criterion removes the practical difficulty of computational overflow and ill-initialization ex- isted in a risk-averting error criterion, which was the predecessor of the NRAE criterion and has benefits to effectively handle non-global local minima by convexifying the non-convex error space. With employing a proper convexification and deconvexification strategy, it is also uncovered that the NRAE criterion has the advantage in handling high-dimensional non-convex optimization of deep neural networks, which typically suffer from difficulties such like local minima, saddle points, large flat regions, etc. existed in the non-convex error spaces. In this dissertations, the effectiveness of proposed methods based on the NRAE crite- rion is first evaluated in training multilayer perceptrons (MLPs) for function approximation tasks, demonstrating the optimization advantage in avoiding or alleviating the local min- imum problem compared to the training with the standard mean squared error criterion. Moreover, the NRAE-based training methods that are applied to train convolutional neural networks and deep MLPs for recognizing handwritten digits in the MNIST dataset present better optimization and generalization results than many benchmark performances, which were achieved by integrating different non-convex training criteria and deep learning ap- proaches. At last, to enhance the generalization of the ANN trained by the NRAE-based method, a statistical pruning method that prunes redundant connections of ANN is im- plemented and experimented for further improving the generalization ability of the ANN trained by the NRAE criterion.