An Automated Method for Analyzing Unstructured Health Data

Author/Creator

Author/Creator ORCID

Date

2016-01-01

Type of Work

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

The past decade has seen a surge in the implementation of electronic health care records. These patient records contain valuable medical information including patient demographic data, diagnosis, therapeutic approach, and patient outcomes. In this dissertations, a method is presented for identifying these themes and patterns within patient data. This methodology includes extraction of the main themes or patterns in the data and linking those themes back to the corpus from which they were generated. In this dissertations, graphs were partitioned from terms gathered from electronic medical records, case studies, and behavioral health forums. The Electronic Medical Records, case studies, and online health forum data were modeled as networks of interacting terms where the interactions were captured by their co-occurrences in the documents. A greedy algorithm was used to find communities with high modularity. We compared our method with a number of other methods in the i2b2 2014 challenge data set for health-risk prediction tasks. A Monte Carlo simulation was run to show that our graphs were significantly different from random graphs of the same sparsity. Finally, we compared our method with probabilistic topic modeling algorithms and evaluated the efficacy of our method by using recall and precision measures.