A Tutorial on Neural Networks and Gradient-free Training





Citation of Original Publication


This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)



This paper presents a compact, matrix-based representation of neural networks in a self-contained tutorial fashion. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are mathematical nonlinear functions constructed by composing several vector-valued functions. Using basic results from linear algebra, we represent a neural network as an alternating sequence of linear maps and scalar nonlinear functions, also known as activation functions. The training of neural networks requires the minimization of a cost function, which in turn requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a function composed of a sequence of linear maps and nonlinear functions. In addition to the analytical gradient computation, we consider two gradient-free training methods and compare the three training methods in terms of convergence rate and prediction accuracy