Text Mining for Hypotheses and Results in Translational Medicine Studies
Links to Files
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Tsai, Terry H., Niels Kasch, Craig Pfeifer, and Tim Oates. “Text Mining for Hypotheses and Results in Translational Medicine Studies.” 2014 IEEE International Conference on Data Mining Workshop, December 2014, 127–32. https://doi.org/10.1109/ICDMW.2014.39.
Rights
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects
Genomics
Natural language processing
Diabetes
Medical diagnostic imaging
Translational informatics
Diseases
Biomedical informatics
Text mining
UMBC Accelerated Cognitive Cybersecurity Laboratory
UMBC Ebiquity Research Group
Bioinformatics
Gene-environment interaction studies
UMBC Cognition, Robotics, and Learning (CoRaL) Lab
Natural language processing
Diabetes
Medical diagnostic imaging
Translational informatics
Diseases
Biomedical informatics
Text mining
UMBC Accelerated Cognitive Cybersecurity Laboratory
UMBC Ebiquity Research Group
Bioinformatics
Gene-environment interaction studies
UMBC Cognition, Robotics, and Learning (CoRaL) Lab
Abstract
Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. In this paper, we describe the development and evaluation, in the context of a genetic study, of a translational-informatics method that supports both machine-learning text mining (e.g., Conditional random fields) and automated inference for identifying key concepts (e.g., Hypotheses and results). After scaling for inter-annotator agreement, our adjusted overall precision was 64%, with a range of 48% to 80%. While other biological text mining systems have focused on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.
