Graph 500 Performance on a Distributed-Memory Cluster

Author/Creator ORCID

Date

2012

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

Social network and medical informatics analysis are examples of modern computing problems that involve large data sets. These data can be represented using graphs, which are sets of vertices that are connected by edges. While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance standard for large graph problems. The Graph 500 benchmark is intended to rank high-performance computers based on speed of memory retrieval. The implementation of this benchmark considers a large, randomly generated graph. It then executes a breadth-first search of the graph starting at one vertex and visiting all other vertices that are connected to the source. The search is executed 64 times using a different vertex as the root each time. Each search records billions of traversed edges per second (GTEPS), and the harmonic mean of these measurements establishes the ranking on the Graph 500 list. We implement the Graph 500 benchmark on the distributed-memory cluster tara in the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). The cluster tara has 82 compute nodes, each with two quad-core Intel Nehalem X5550 CPUs and 24 GB of memory, connected by a quad-data rate InfiniBand interconnect. Our best run to date using 64 nodes achieved a GTEPS rate that would put tara at rank 58 on the June 2012 Graph 500 list. We intend to submit an official benchmark run for the next publication of the Graph 500 list in November 2012.