Parallel Sorting Of Biological Sequences Using The Intel� Concurrent Collections

No Thumbnail Available

Links to Files

Author/Creator

Author/Creator ORCID

Date

2012

Type of Work

Department

Computer Science and Bioinformatics Program

Program

Master of Science

Citation of Original Publication

Rights

This item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.

Abstract

Performing analyses of and computations with biological sequence data, such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), require a lot of processing time and memory using sequential algorithms. Today, programmers and scientists have developed and tested a few models for parallelizing and optimizing algorithms to improve results in bioinformatics. However, some of these approaches have not made efficient use of multi-core systems or computers with many processors. The Intel® Concurrent Collections is a software tool and library for transforming serial programs into semantically equivalent parallel programs. The Intel® Concurrent Collections approach is a new and unique technique for designing parallel programs. It overcomes the over-constraint nature of serial languages by providing a conclusive programming concept and allows for programs to be run efficiently on multi-core systems and computers with many processors. The main goals of this research are: to design a serial C/C++ program for sorting biological sequences based on the Divide and Conquer methodology, to transform the serial C/C++ program into a semantically equivalent parallel C/C++ program using the Intel® Concurrent Collections, to compare and analyze execution times of the serial and parallel programs and to make appropriate conclusions on the suitability of the Divide and Conquer methodology for parallelization, to provide suggestions on the suitability of the Intel® Concurrent Collections technology for parallelization of serial algorithms, and to show the importance of parallelization of bioinformatics algorithms. The main results/achievements of this thesis research are: successful parallel sorting of biological sequences using the merge sort Divide and Conquer algorithm, successfully conducted experiments on the Intel® Many-core Testing Lab, which runs with the RedHat Enterprise Linux operating system and is comprised of 32 processors and 265GB RAM, proof that the Intel® Concurrent Collections programming model can, by parallelization, improve efficiency and speed of algorithms involved in bioinformatics and computational biology, and a conclusion that there are some limitations in the prerelease version of the platform.