The Application Of Information Retrieval Techniques To The Mining Of Bioinformatics Data

This item is made available by Morgan State University for personal, educational, and research purposes in accordance with Title 17 of the U.S. Copyright Law. Other uses may require permission from the copyright owner.

Subjects

Amino acids
Computer science
Bioinformatics

Abstract

This thesis explores the application of information retrieval and text mining techniques to the mining of bioinformatics data. Information retrieval can be defined as a set of processes that involves querying a collection of objects in order to extract relevant information from the data. The goal of this work is to invoke a mathematical structure on bioinformatics database objects that facilitate the use of vector space techniques encountered in text mining and information retrieval systems. The approach presented is quite general and applicable to various categories of bioinformatics data such as text, sequence, or structural objects. The main contribution of this thesis is to demonstrate how vector space techniques typically encountered in the field of text information retrieval can be applied to bioinformatics data. Much of the work in this thesis is devoted to the numerical encoding of bioinformatics sequence data such that relevant biological and chemical characteristics are preserved; hence, the Blocks Database is applied as the template for testing the applied techniques. It is established that the vector space technique is consistent with pattern classification methodologies commonly applied within the bioinformatics literature, also numerous subspace decomposition techniques are presented and applied to classify patterns.

The Application Of Information Retrieval Techniques To The Mining Of Bioinformatics Data

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract