Parallel Performance Studies for a Parabolic Test Problem on the Cluster tara

Author/Creator ORCID

Date

2010

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

The performance of parallel computer code depends on the intricate interplay of processors, the architecture of the computer nodes, their interconnect network, the numerical algorithm, and its implementation. The solution of large, sparse, highly structured of equations of linear equations by an iterative linear solver that requires communication between the parallel processes at every iteration is an instructive test of this interplay. This note considers a parabolic test problem given by a time-dependent, scalar, linear reaction-diffusion equation in three dimensions, whose time-stepping requires the solution of such a system of linear equations at every timestep. The results presented here show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes. The results support the scheduling policy implemented, since they confirm that it is beneficial to use all eight cores of the two quad-core processors on each node simultaneously, giving us in-effect a computer that can run jobs efficiently with up to 656 parallel processes when using all 82 compute nodes. The cluster tara is an IBM server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.