The Application Of Information Retrieval Techniques To The Mining Of Bioinformatics Data

No Thumbnail Available

Links to Files

Author/Creator ORCID

Date

2011

Type of Work

Department

Computer Science and Bioinformatics Program

Program

Master of Science

Citation of Original Publication

Rights

This item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.

Abstract

This thesis explores the application of information retrieval and text mining techniques to the mining of bioinformatics data. Information retrieval can be defined as a set of processes that involves querying a collection of objects in order to extract relevant information from the data. The goal of this work is to invoke a mathematical structure on bioinformatics database objects that facilitate the use of vector space techniques encountered in text mining and information retrieval systems. The approach presented is quite general and applicable to various categories of bioinformatics data such as text, sequence, or structural objects. The main contribution of this thesis is to demonstrate how vector space techniques typically encountered in the field of text information retrieval can be applied to bioinformatics data. Much of the work in this thesis is devoted to the numerical encoding of bioinformatics sequence data such that relevant biological and chemical characteristics are preserved; hence, the Blocks Database is applied as the template for testing the applied techniques. It is established that the vector space technique is consistent with pattern classification methodologies commonly applied within the bioinformatics literature, also numerous subspace decomposition techniques are presented and applied to classify patterns.