Long-time Simulations with Complex Code Using Multiple Nodes of Intel Xeon Phi Knights Landing

Author/Creator ORCID

Date

2018-08-01

Department

Program

Citation of Original Publication

Jonathan S.Graf, Matthias K.Gobbert, SamuelKhuvis, Long-time simulations with complex code using multiple nodes of Intel Xeon Phi Knights Landing, Journal of Computational and Applied Mathematics Volume 337, pp 18-36, 2018, https://doi.org/10.1016/j.cam.2017.12.050

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please contact the author.
Attribution-NonCommercial-NoDerivs 3.0 United States

Abstract

Modern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and uncertainty quantification. To evaluate the behavior of the model for sufficiently long times, for instance, to compare to laboratory time scales, often requires long-time simulations with small time steps and high mesh resolutions. This motivates the need for very efficient numerical methods and the use of parallel computing on the most recent modern architectures. We use complex code resulting from a PDE model of calcium dynamics in a heart cell to analyze the performance of the recently released Intel Xeon Phi Knights Landing (KNL). The KNL is a second-generation many-integrated-core (MIC) processor released in 2016 with a theoretical peak performance of over 3 TFLOP/s of double-precision floating-point operations for which complex codes can be easily ported because of the x86 compatibility of each KNL core. We demonstrate the benefit of hybrid MPI+OpenMP code when implemented effectively and run efficiently on the KNL including on multiple KNL nodes. For multi-KNL runs for our sample code, it is shown to be optimal to use all cores of each KNL, one MPI process on every other tile, and only two of the maximum of four threads per core.