Velusamy, KaushikRolinger, Thomas B.McMahon, JaniceSimon, Tyler A.2024-02-292024-02-292018-11-29K. Velusamy, T. B. Rolinger, J. McMahon and T. A. Simon, "Exploring Parallel Bitonic Sort on a Migratory Thread Architecture," 2018 IEEE High Performance extreme Computing Conference (HPEC), Waltham, MA, USA, 2018, pp. 1-7, doi: 10.1109/HPEC.2018.8547568.https://doi.org/10.1109/HPEC.2018.8547568http://hdl.handle.net/11603/317452018 IEEE High Performance extreme Computing Conference (HPEC), Waltham, MA, USA 25-27 September 2018Large scale, data-intensive applications pose challenges to systems with a traditional memory hierarchy due to their unstructured data sources and irregular memory access patterns. In response, systems that employ migratory threads have been proposed to mitigate memory access bottlenecks as well as reduce energy consumption. One such system is the Emu Chick, which migrates a small program context to the data being referenced in a memory access. Sorting an unordered list of elements is a critical kernel for countless applications, such as graph processing and tensor decomposition. As such applications can be considered highly suitable for a migratory thread architecture, it is imperative to understand the performance of sorting algorithms on these systems. In this paper, we implement parallel bitonic sort and target the Emu Chick system. We investigate the performance of an explicit comparison-based approach as well as a sorting network implementation. Furthermore, we explore two different data layouts for the parallel bitonic sorting network, namely cyclic and blocked. From the results of our performance study, we find that while thread migrations can dictate the overall performance of an application, the cost of thread creation and management can out-grow the cost of thread migration.7 pages© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Exploring Parallel Bitonic Sort on a Migratory Thread ArchitectureText