Throughput studies on an InfiniBand interconnect via all-to-all communications
Links to Fileshttps://dl.acm.org/citation.cfm?id=2872611
MetadataShow full item record
Type of Work7 pages
conference paper pre-print
Citation of Original PublicationNil Mistry, Jordan Ramsey, Benjamin Wiley, Jackie Yanchuck, Xuan Huang, Matthias K. Gobbert, Throughput studies on an InfiniBand interconnect via all-to-all communications, HPC '15 Proceedings of the Symposium on High Performance Computing Pages 93-99 ,
RightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please contact the author.
High Performance Computing Facilty (HPCF)
Distributed-memory clusters are the most important type of parallel computer today, and they dominate the TOP500 list. The InfiniBand interconnect is the most popular network for distributed-memory compute clusters. Contention of communications across a switched network that connects multiple compute nodes in a distributed-memory cluster may seriously degrade performance of parallel code. This contention is maximized when communicating large blocks of data among all parallel processes simultaneously. This communication pattern arises in many important algorithms such as parallel sorting. The cluster tara in the UMBC High Performance Computing Facility (HPCF) with a quad-data rate InfiniBand interconnect provides an opportunity to test if the capacity of a switched network can become a limiting factor in algorithmic performance. We find that we can design a test case of a problem involving increasing usage of memory that does not scale any more on the InifiniBand interconnect, thus becoming a limiting factor for parallel scalability. However, for the case of stable memory usage of the problem, the InifiniBand communications get faster and will not inhibit parallel scalability. The tests in this paper are designed to involve only basic MPI commands for wide reproducibility, and the paper provides the detailed motivation of the design of the memory usage needed for the tests.