• Login
    View Item 
    •   Maryland Shared Open Access Repository Home
    • eScholarship@Morgan
    • MSU Student Collection
    • View Item
    •   Maryland Shared Open Access Repository Home
    • eScholarship@Morgan
    • MSU Student Collection
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Biological Sequence Analysis Using Hadoop/Mapreduce As A Distributed Computing Model

    Files
    Paudel_morgan_0755M_10242.pdf (2.638Mb)
    Permanent Link
    http://hdl.handle.net/11603/10500
    Collections
    • MSU Student Collection
    Metadata
    Show full item record
    Author/Creator
    Paudel, Roshan
    Date
    2012
    Type of Work
    Text
    theses
    Department
    Computer Science and Bioinformatics Program
    Program
    Master of Science
    Rights
    This item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.
    Subjects
    Bioinformatics
    Computer science
    Electronic data processing--Distributed processing
    Abstract
    Most Biological (DNA, RNA or Protein) sequence analyzing algorithms are complex and require extensive execution time and memory. Serial Biological Sequence Processing Algorithms do not use the computing power of present computers very efficiently. Today, researchers and scientists have developed and tested many programming models for parallelizing and optimizing algorithms to decrease execution time and memory used. MapReduce is a programming model based on functional programming, where users implement interface of two functions - map and reduce. In general, map is a kind of application of functions and reduce is he aggregations of the results of those applications. MapReduce Programming Model is patented by Google. In this research, Hadoop implementation of MapReduce was used. Hadoop and Hadoop Distributed File System are open source models of MapReduce and Google File System. Hadoop framework automatically transforms map and reduce applications into map and reduce tasks. All known biological sequences and their functional annotations are stored in biological databases. A newly determined biological sequence should be compared with each and every known corresponding biological sequence to detect potential structural or evolutionary relationships. From a computational point of view, a major challenge is to align the query biological sequence to a very large collection of biological sequences and sort them according to the score of their alignment with the input biological sequence. The solution has to be fast and scalable. The main goals of this thesis research are: * To build a fully-distributed Ubuntu Hadoop cluster of four nodes. * To configure and test Hadoop cluster in the LittleFe cluster computer. * To seek, determine and measure the efficiency of program in terms of used time and memory. The main achievements/results of this thesis research are: * Transformation of the LittleFe BCCD operating system cluster computer into the Ubuntu operating system cluster computer. * Two Hadoop examples- the RandomTextWriter.java and SecondarySort.java were modified into the Hadoop MRGenerateDNA.java program to generate big file of random DNA sequences and the Hadoop MRSortDNA.java program to sort DNA sequences in an order respectively. * Proved that Hadoop is an efficient programming model to develop new parallel algorithms for biological sequence processing based on Map Reduce Programming model.


    Growing the Future, Leading the World!


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.

     

     

    My Account

    LoginRegister

    Browse

    This CollectionBy Issue DateTitlesAuthorsSubjectsType

    Statistics

    View Usage Statistics


    Growing the Future, Leading the World!


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.