Convergence of the mini-batch SIHT algorithm
Author/Creator ORCID
Date
2022-09-29Type of Work
8 pagesText
journal articles
preprints
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/
Abstract
The Iterative Hard Thresholding (IHT) algorithm has been considered extensively
as an effective deterministic algorithm for solving sparse optimizations. The IHT
algorithm benefits from the information of the batch (full) gradient at each point
and this information is a crucial key for the convergence analysis of the generated sequence. However, this strength becomes a weakness when it comes to machine learning and high dimensional statistical applications because calculating
the batch gradient at each iteration is computationally expensive or impractical.
Fortunately, in these applications the objective function has a summation structure
that can be taken advantage of to approximate the batch gradient by the stochastic
mini-batch gradient. In this paper, we study the mini-batch Stochastic IHT (SIHT)
algorithm for solving the sparse optimizations. As opposed to previous works
where increasing and variable mini-batch size is necessary for derivation, we fix
the mini-batch size according to a lower bound that we derive and show our work.
To prove stochastic convergence of the objective value function we first establish a
critical sparse stochastic gradient descent property. Using this stochastic gradient
descent property we show that the sequence generated by the stochastic mini-batch
SIHT is a supermartingale sequence and converges with probability one. Unlike
previous work we do not assume the function to be a restricted strongly convex.
To the best of our knowledge, in the regime of sparse optimization, this is the first
time in the literature that it is shown that the sequence of the stochastic function
values converges with probability one by fixing the mini-batch size for all steps.
The following license files are associated with this item:
- Creative Commons