A Tutorial on Neural Networks and Gradient-free Training
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2022-11-26
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Subjects
Abstract
This paper presents a compact, matrix-based
representation of neural networks in a self-contained tutorial
fashion. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are
mathematical nonlinear functions constructed by composing
several vector-valued functions. Using basic results from linear
algebra, we represent a neural network as an alternating
sequence of linear maps and scalar nonlinear functions, also
known as activation functions. The training of neural networks
requires the minimization of a cost function, which in turn
requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a
function composed of a sequence of linear maps and nonlinear
functions. In addition to the analytical gradient computation,
we consider two gradient-free training methods and compare
the three training methods in terms of convergence rate and
prediction accuracy