Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators

Khuvis, Samuel

Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators

dc.contributor.advisor	Gobbert, Matthias K
dc.contributor.author	Khuvis, Samuel
dc.contributor.department	Mathematics and Statistics
dc.contributor.program	Mathematics, Applied
dc.date.accessioned	2019-10-11T14:01:51Z
dc.date.available	2019-10-11T14:01:51Z
dc.date.issued	2016-01-01
dc.description.abstract	Modern architectures with multiple memory hierarchies in multi-core CPUs and coprocessors such as the massively parallel GPGPU and many-core Intel Xeon Phi offer opportunities to drastically speed up numerical kernels. Coprocessors, which supplement the work of the CPUs, generally have significantly more cores and threads than a multi-core CPU and use power more efficiently. The Intel Xeon Phi is a newer hardware released to the public only in 2013. Each Intel Phi has between 57 and 61 cores, each capable of up to four threads. Each core is x86 compatible and is capable of running its own instruction stream, allowing programmers to use familiar CPU frameworks such as MPI and OpenMP. Three modes of execution are available on the Intel Phi: (i) offloading, where the program is run on the CPU and segments of the code are moved to the Intel Phi, similar to GPGPU programming, (ii) native, where the program is run directly on the Intel Phi, and (iii) symmetric, where the program is run on the CPU and Phi jointly. We report the performance of three test problems whose structure is representative of kernels of real-world application codes. The first problem is the classical elliptic test problem of a Poisson equation with homogeneous Dirichlet boundary conditions in two and three dimensions. The second problem is a model of calcium induced calcium release in a heart cell. In this model, calcium activates calcium release from the sarcoplasmic reticulum in the cytosol, an essential part of the excitation-contraction coupling in the cardiac muscle. This process is modeled by a system of coupled, non- linear, time-dependent advection-diffusion-reaction equations solved by a method of lines approach. The third problem is a model of pancreatic beta cells in a computational islet. Results are presented for a model without coupling between cells and a model with electrical coupling between cells. Code can easily be ported to the Intel Phi with the use of a compiler flag, however real-world applications may require significant modifications to existing CPU code in order to perform well on the Intel Phi. Code with a high degree of parallelism is required to take advantage of the many cores of the Phi. Offload mode performs poorly for real-world problems due to the cost of communication between the CPU and Phi and the restriction of only using OpenMP on the Phi. For good performance, a combination of MPI and OpenMP is required to take advantage of the complex memory hierarchy of the Intel Phi in native mode. The use of manual loop unrolling to fully utilize the vector registers may significantly improve performance on the Intel Phi. Symmetric mode requires MPI for communication between Phis and multi-core CPUs as well as OpenMP for good performance on the Phis. Using all available resources of the hybrid node in symmetric mode results in the best performance. Studies on multiple hybrid nodes connected by a high-performance interconnect exhibit excellent strong and weak scalability.
dc.genre	dissertations
dc.identifier	doi:10.13016/m2p4ho-ygls
dc.identifier.other	11417
dc.identifier.uri	http://hdl.handle.net/11603/15677
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Mathematics and Statistics Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.source	Original File Name: Khuvis_umbc_0434D_11417.pdf
dc.subject	Intel Xeon Phi
dc.subject	numerical methods
dc.subject	partial differential equations
dc.title	Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Khuvis_umbc_0434D_11417.pdf
Size:: 4.53 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Khuvis_Porting_Open.pdf
Size:: 42.58 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations