Knowledge Discovery Through Linking Multiple Heterogeneous, Unstructured Data Streams: A Case Of Clinical Notes Mining

dc.contributor.advisorJaneja, Vandana P.
dc.contributor.authorAlodadi, Mohammad Saad
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2021-09-01T13:55:38Z
dc.date.available2021-09-01T13:55:38Z
dc.date.issued2020-01-01
dc.description.abstractThe focus of this dissertations is to discover patterns of significant relationships across unstructured, heterogeneous data streams. One such example domain is in Electronic Health Records (EHR) for text in treatments, tests, and diagnoses, particularly in clinical notes. While applying large and automated analysis on the unstructured data can be much more complicated than on structured data, it can potentially provide better support and information for decision making. From the health care providers? perspective, the large number of medical tests of a patient has further increased the complexity in identifying and determining an accurate diagnosis for each patient. In addition, the EHR should be able to support Clinical Decision Support Systems (CDS) for providing health professionals with the most recent and related biomedical literature to improve the decision-making process concerning a given patient'srecord. However, the current EHR systems lack this ability. This creates a gap between the advances in the biomedical domain and day to day practices within EHR systems. In this dissertations, we propose several methods to overcome these challenges. For extracting knowledge from heterogeneous unstructured data, we propose a weighted association rules mining method to extract significant entities from the unstructured data and generate weighted association rules among them. We also expand our data by utilizing ontology-based expansion. Our discovered rules reveal non-trivial interdependencies which can help support practitioners? decision, such as during clinical interventions. Our frequency-based methods generate rules with higher interestingness and relatedness rate provided by a health provider. Furthermore, on a temporal use case in EHR, our method shows an increase in rules in later days of hospital admission, which imitate the secondary diagnoses phenomenon, which are conditions that coexist at the time of admission that develop subsequently to the principal diagnosis. Our preliminary results with proposed weighted transactional item representation show promising results in identifying strongly related entities, for example, in medical entities (diagnosis, test, treatments). To improve the literature search for professionals, we propose and evaluate multiple query expansion and re-ranking methods. The expansion query methods rely on vocabularies and different embedding models. The re-ranking method relies on dual embedding retrieval indexes that focus on latent features compared to standard explicit terms. We experiment with these methods on publicly available ad-hoc information retrieval tasks in biomedical literature. The results show that combining latent features and explicit features increases the precision of the results. We performed an extrinsic evaluation on multiple datasets (a biomedical literature database and health forum data). We found that our embedding model can be generalized to offer data representation of medical textual data (such as biomedical literature, clinical notes, or medical social media data).
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m2yfei-osxk
dc.identifier.other12306
dc.identifier.urihttp://hdl.handle.net/11603/22878
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Alodadi_umbc_0434D_12306.pdf
dc.subjectClinical Data Mining
dc.subjectData Mining
dc.subjectData Science
dc.subjectInformation retrieval
dc.subjectNatural Language Processing
dc.subjectText Mining
dc.titleKnowledge Discovery Through Linking Multiple Heterogeneous, Unstructured Data Streams: A Case Of Clinical Notes Mining
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Alodadi_umbc_0434D_12306.pdf
Size:
4.2 MB
Format:
Adobe Portable Document Format