Browsing by Subject "Parallel performance study"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Numerical Methods for Parallel Simulation of Diffusive Pollutant Transport from a Point SourceSienkiewicz, Noah; Pandya, Arjun; Brown, Tim; Barajas, Carlos; Gobbert, Matthias K.In an interdisciplinary project combining Atmospheric Physics, High Performance Computing, and Big Data, we explore a numerical method for solving a physical system modeled by a partial differential equation. The application problem models the spread of pollution by a reaction-diffusion equation solved by the finite volume method. The numerical method is derived and tested on a known test problem in Matlab and then parallelized by MPI in C. We explore both closed and open systems of pollution, and show that the finite volume method is both mass conservative and has the ability to handle a point source modeled by the Dirac delta distribution. A parallel performance study confirms the scalability of the implementation to several compute nodes.Item Parallel Performance Studies for a Three-Species Application Problem on the Cluster tara(2010) Trott, David W.; Gobbert, Matthias K.High performance parallel computing depends on the interaction of a number of factors including the processors, the architecture of the compute nodes, their interconnect network, and the numerical code. In this note, we present performance and scalability studies on the cluster tara using a well established parallelized code for a three-species application problem. This application problem requires long-time simulations on a fine mesh, thus posing a very computationally intensive problem. The speedup of run times afforded by parallel computing makes the difference between simply unacceptably long runs to obtain the results (e.g., several days or weeks) and practically feasible studies (e.g., overnight runs). The results also support the scheduling policy implemented, since they confirm that it is beneficial to use all eight cores of the two quad-core processors on each node simultaneously, giving us in-effect a computer that can run jobs efficiently with up to 656 parallel processes when using all 82 compute nodes. The cluster tara is an IBM server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.Item Parallel Performance Studies for an Elliptic Test Problem on the Cluster tara(2010) Raim, Andrew M.; Gobbert, Matthias K.The performance of parallel computer code depends on an intricate interplay of the processors, the architecture of the compute nodes, their interconnect network, the numerical algorithm, and its implementation. The solution of large, sparse, highly structured systems of linear equations by an iterative linear solver that requires communication between the parallel processes at every iteration is an instructive test of this interplay. This note considers the classical elliptic test problem of a Poisson equation with Dirichlet boundary conditions in two spatial dimensions, whose approximation by the finite difference method results in a linear system of this type. Our existing implementation of the conjugate gradient method for the iterative solution of this system is known to have the potential to perform well up to many parallel processes, provided the interconnect network has low latency. Since the algorithm is known to be memory bound, it is also vital for good performance that the architecture of the nodes in conjunction with the scheduling policy does not create a bottleneck. The results presented here show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes. The results support the scheduling policy implemented, since they confirm that it is beneficial to use all eight cores of the two quad-core processors on each node simultaneously, giving us in effect a computer that can run jobs efficiently with up to 656 parallel processes when using all 82 compute nodes. The cluster tara is an IBM Server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user, and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.