A Tutorial on Neural Networks and Gradient-free Training
Links to Files
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
This paper presents a compact, matrix-based
representation of neural networks in a self-contained tutorial
fashion. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are
mathematical nonlinear functions constructed by composing
several vector-valued functions. Using basic results from linear
algebra, we represent a neural network as an alternating
sequence of linear maps and scalar nonlinear functions, also
known as activation functions. The training of neural networks
requires the minimization of a cost function, which in turn
requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a
function composed of a sequence of linear maps and nonlinear
functions. In addition to the analytical gradient computation,
we consider two gradient-free training methods and compare
the three training methods in terms of convergence rate and
prediction accuracy
