MPT: Multiple Parallel Tempering for High-Throughput MCMC Samplers

Author/Creator ORCID

Date

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

This paper proposes “Multiple Parallel Tempering” (MPT) as a class of Markov Chain Monte Carlo (MCMC) algorithm for high-throughput hardware implementations. MCMC algorithms are used to generate samples from target probability densities and are commonly employed in stochastic processing techniques such as Bayesian inference, and maximum likelihood estimation, in which computing large amount of data in real-time with high-throughput samplers is critical. For high-dimensional and multi-modal probability densities, Parallel Tempering (PT) MCMC has proven to have superior mixing and higher convergence to the target distribution as compared to other popular MCMC algorithms such as Metropolis-Hastings (MH). MPT algorithm, proposed in this paper, imposes a new integer parameter, D, to the original algorithm of PT. Such modification changes one MCMC sampler into multiple independent kernels that alternatively generate their set of samples one after another. Our experimental results on Gaussian mixture models show that for large values of D, the auto-correlation function of the proposed MPT falls comparably to that of a PT sampler. A fully configurable and pipelined hardware accelerator for the proposed MPT, as well as PT are designed in Verilog HDL and implemented on FPGA. The two algorithms are also written in C language and evaluated on Multi-core CPU from the TX2 SoC. Our implementation results indicate that by selecting an appropriate value for D in our case study the sampling throughput of the MPT can raise from 4.5 Msps in PT to 135 Msps on average, an amount near maximum achievable frequency of the target FPGA, which is about 1470x higher than when implementing on fully exploited Multi-core CPU.