The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect

Author/Creator ORCID

Date

2012-12-17

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval. The Graph 500 implementation considers large, randomly generated graphs, which may be spread across many nodes on a distributed memory cluster. The benchmark conducts breadth-first searches on these graphs, and measures performance in billions of traversed edges per second (GTEPS). We present our experience implementing and running the Graph 500 benchmark on the medium-size distributed-memory cluster tara in the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). The cluster tara has 82 compute nodes, each with two quad-core Intel Nehalem X5550 CPUs and 24 GB of memory, connected by a high-performance quad-data rate InfiniBand interconnect. Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications. Our best run to date was of scale 31 using 64 nodes and achieved a GTEPS rate that placed tara at rank 98 on the November 2012 Graph 500 list.