Orchestrating Non-Blocking Asynchronous Framework for HPC Systems and Applications

dc.contributor.advisorHalem, Milton
dc.contributor.authorVelusamy, Kaushik
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2022-09-29T15:37:54Z
dc.date.available2022-09-29T15:37:54Z
dc.date.issued2021-01-01
dc.description.abstractIn modern supercomputing applications, communication dominates computation. This is seen by the peak performance on the largest supercomputer applications being roughly around 9%. In addition, large-scale graph data analytics pose challenges to systems with a traditional memory hierarchy due to their unstructured data sources and irregular memory access patterns. Standard benchmarks like LINPACK which focuses on Floating Point Operations Per Second [FLOPS], do not give any importance to communications, as of today's many modern scientific applications need. In the analytics world, when the data size of an application becomes sufficiently larger than the DRAM memory, there is a problem keeping processors busy, which in turn leads to the need for faster memory and bandwidth. The rate of improvement in microprocessor speed greatly exceeds the rate of improvement in DRAM memory speed. In order to overcome this limitation of bandwidth speed, inconsistent with processor speed, asynchronous programming provides a way to deal with blocking waits and executes events independent of the main program flow. The performance of many HPC applications critically depends on how well the applications can hide the long latency of data movement by overlapping communications with ongoing computations, thereby minimizing wait time and data transfers. In this thesis, we designed and developed a multi-step Non-Blocking Asynchronous Framework (N-BAF) to enable a user to efficiently increase application performance on high-performance computing systems. The first step of N-BAF addresses the data movement and memory bandwidth problems through an analytical performance model by automatically extracting an execution flow graph to identify application communication hotspots. The next step uses the flow graph to optimize out-of-core I/O requests, by the use of prefetching, operation reordering, in-memory shuffling, and mailbox abstraction. We provide tools to disassemble blocking communications to non-blocking operations and alleviate the long latency of irregular data movement intensive applications. We also evaluated and addressed the data movement problems with N-BAF methodologies in novel Parallel Migratory Thread Architecture, Coherent Accelerator Processor Interface, and Persistent Memory Allocators. To illustrate the performance improvement gained from this framework, we implemented three applications showing irregular behavior. 1. N-Body simulation with Barnes-Hut algorithm from the molecular dynamics domain. 2. Navier stokes equation from the computational fluid dynamics domain and 3. Breadth-First Search algorithm and obtained a 20-45% improvement compared to the base case.?
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m2injs-hqfy
dc.identifier.other12465
dc.identifier.urihttp://hdl.handle.net/11603/25980
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Velusamy_umbc_0434D_12465.pdf
dc.titleOrchestrating Non-Blocking Asynchronous Framework for HPC Systems and Applications
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Velusamy_umbc_0434D_12465.pdf
Size:
4.41 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Velusamy-Kaushik_Open.pdf
Size:
455.59 KB
Format:
Adobe Portable Document Format
Description: