A Comparative Study of the Parallel Performance of the Blocking and Non-Blocking MPI Communication Commands on an Elliptic Test Problem on the Cluster tara

Author/Creator ORCID

Date

2016

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

In this report we study the parallel solution of the elliptic test problem of a Poisson equation with homogenous Dirichlet boundary conditions in a two dimensional domain. We use the finite difference method to approximate the governing equations with a system of N2 linear equations, with N the number of interior grid points in either spatial direction. To parallelize the computation, we distribute blocks of the rows of the interior mesh point values among the parallel processes. We then use the iterative conjugate gradient method featured with a so-called matrix-free implementation to solve the system of linear equations local to any of the processes. The conjugate gradient method initiates with local vectors of zero elements, as the start solution, and updates the successive solutions until the Euclidean norm of the global residual of the local iterative solutions relative to that of the global residual of the local start solutions vanishes based on a predefined tolerance. To achieve this and considering the fact that the conjugate gradient method forces some communication between the neighboring processes, i.e. the processes possessing data of the grid interfaces, two modes of MPI communications, namely blocking and non-blocking send and receive, are employed for the data exchange between the processes. The obtained results given accordingly show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes, especially once non-blocking MPI commands are used. The cluster tara is an IBM Server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user, and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.