The Application Of Information Retrieval Techniques To The Mining Of Bioinformatics Data
No Thumbnail Available
Links to Files
Permanent Link
Collections
Author/Creator
Author/Creator ORCID
Date
2011
Department
Computer Science and Bioinformatics Program
Program
Master of Science
Citation of Original Publication
Rights
This item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.
Subjects
Abstract
This thesis explores the application of information retrieval and text mining techniques to the mining of bioinformatics data. Information retrieval can be defined as a set of processes that involves querying a collection of objects in order to extract relevant information from the data. The goal of this work is to invoke a mathematical structure on bioinformatics database objects that facilitate the use of vector space techniques encountered in text mining and information retrieval systems. The approach presented is quite general and applicable to various categories of bioinformatics data such as text, sequence, or structural objects. The main contribution of this thesis is to demonstrate how vector space techniques typically encountered in the field of text information retrieval can be applied to bioinformatics data. Much of the work in this thesis is devoted to the numerical encoding of bioinformatics sequence data such that relevant biological and chemical characteristics are preserved; hence, the Blocks Database is applied as the template for testing the applied techniques. It is established that the vector space technique is consistent with pattern classification methodologies commonly applied within the bioinformatics literature, also numerous subspace decomposition techniques are presented and applied to classify patterns.