Knowledge Discovery Through Linking Multiple Heterogeneous, Unstructured Data Streams: A Case Of Clinical Notes Mining

Alodadi, Mohammad Saad

Knowledge Discovery Through Linking Multiple Heterogeneous, Unstructured Data Streams: A Case Of Clinical Notes Mining

dc.contributor.advisor	Janeja, Vandana P.
dc.contributor.author	Alodadi, Mohammad Saad
dc.contributor.department	Information Systems
dc.contributor.program	Information Systems
dc.date.accessioned	2021-09-01T13:55:38Z
dc.date.available	2021-09-01T13:55:38Z
dc.date.issued	2020-01-01
dc.description.abstract	The focus of this dissertations is to discover patterns of significant relationships across unstructured, heterogeneous data streams. One such example domain is in Electronic Health Records (EHR) for text in treatments, tests, and diagnoses, particularly in clinical notes. While applying large and automated analysis on the unstructured data can be much more complicated than on structured data, it can potentially provide better support and information for decision making. From the health care providers? perspective, the large number of medical tests of a patient has further increased the complexity in identifying and determining an accurate diagnosis for each patient. In addition, the EHR should be able to support Clinical Decision Support Systems (CDS) for providing health professionals with the most recent and related biomedical literature to improve the decision-making process concerning a given patient'srecord. However, the current EHR systems lack this ability. This creates a gap between the advances in the biomedical domain and day to day practices within EHR systems. In this dissertations, we propose several methods to overcome these challenges. For extracting knowledge from heterogeneous unstructured data, we propose a weighted association rules mining method to extract significant entities from the unstructured data and generate weighted association rules among them. We also expand our data by utilizing ontology-based expansion. Our discovered rules reveal non-trivial interdependencies which can help support practitioners? decision, such as during clinical interventions. Our frequency-based methods generate rules with higher interestingness and relatedness rate provided by a health provider. Furthermore, on a temporal use case in EHR, our method shows an increase in rules in later days of hospital admission, which imitate the secondary diagnoses phenomenon, which are conditions that coexist at the time of admission that develop subsequently to the principal diagnosis. Our preliminary results with proposed weighted transactional item representation show promising results in identifying strongly related entities, for example, in medical entities (diagnosis, test, treatments). To improve the literature search for professionals, we propose and evaluate multiple query expansion and re-ranking methods. The expansion query methods rely on vocabularies and different embedding models. The re-ranking method relies on dual embedding retrieval indexes that focus on latent features compared to standard explicit terms. We experiment with these methods on publicly available ad-hoc information retrieval tasks in biomedical literature. The results show that combining latent features and explicit features increases the precision of the results. We performed an extrinsic evaluation on multiple datasets (a biomedical literature database and health forum data). We found that our embedding model can be generalized to offer data representation of medical textual data (such as biomedical literature, clinical notes, or medical social media data).
dc.format	application:pdf
dc.genre	dissertations
dc.identifier	doi:10.13016/m2yfei-osxk
dc.identifier.other	12306
dc.identifier.uri	http://hdl.handle.net/11603/22878
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Alodadi_umbc_0434D_12306.pdf
dc.subject	Clinical Data Mining
dc.subject	Data Mining
dc.subject	Data Science
dc.subject	Information retrieval
dc.subject	Natural Language Processing
dc.subject	Text Mining
dc.title	Knowledge Discovery Through Linking Multiple Heterogeneous, Unstructured Data Streams: A Case Of Clinical Notes Mining
dc.type	Text
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Alodadi_umbc_0434D_12306.pdf
Size:: 4.2 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Theses and Dissertations
UMBC Graduate School
UMBC Information Systems Department
UMBC Student Collection