The Application Of Information Retrieval Techniques To The Mining Of Bioinformatics Data
MetadataShow full item record
Type of WorkText
DepartmentComputer Science and Bioinformatics Program
RightsThis item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.
This thesis explores the application of information retrieval and text mining techniques to the mining of bioinformatics data. Information retrieval can be defined as a set of processes that involves querying a collection of objects in order to extract relevant information from the data. The goal of this work is to invoke a mathematical structure on bioinformatics database objects that facilitate the use of vector space techniques encountered in text mining and information retrieval systems. The approach presented is quite general and applicable to various categories of bioinformatics data such as text, sequence, or structural objects. The main contribution of this thesis is to demonstrate how vector space techniques typically encountered in the field of text information retrieval can be applied to bioinformatics data. Much of the work in this thesis is devoted to the numerical encoding of bioinformatics sequence data such that relevant biological and chemical characteristics are preserved; hence, the Blocks Database is applied as the template for testing the applied techniques. It is established that the vector space technique is consistent with pattern classification methodologies commonly applied within the bioinformatics literature, also numerous subspace decomposition techniques are presented and applied to classify patterns.