Browsing by Subject "data integration"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Data Integration in HCBS Program Development(The Hilltop Institute, 2009-09-23) Stockwell, IanHilltop Senior Research Analyst Ian Stockwell, M.A., gave a presentation entitled Data Integration in HCBS Program Development at the National Association of State Units on Aging 45th annual membership meeting in Denver, Colorado on September 23, 2009. The presentation discussed the potential that state policymakers now have to pull individual-level information together to form a complete picture of a program population through data integration. This method can provide a wealth of demographic and health status information for program building, as well as help predict service use, set appropriate individual budgets, and determine potential cost savings. Stockwell provided a brief overview of data sources currently available in most states; discussed the potential of new web-based information systems and provided a case study of a system currently in use; and discussed possible metrics and benchmarks, as well as some “best practices” on how to combine disparate datasets.Item Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling(IEEE, 2017-01-15) Sleeman, Jennifer; Finin, Tim; Cane, Mark; Halem, MiltonWe describe an approach using dynamic topic modeling to model influence and predict future trends in a scientific discipline. Our study focuses on climate change and uses assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. We use a custom dynamic topic modeling algorithm to generate topics for both datasets and apply crossdomain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time. For the IPCC use case, the report topic model used 410 documents and a vocabulary of 5911 terms while the citations topic model was based on 200K research papers and a vocabulary more than 25K terms. We show that our approach can predict the importance of its extracted topics on future IPCC assessments through the use of cross domain correlations, Jensen-Shannon divergences and cluster analytics.Item Dynamic Topic Modeling to Infer the Influence of Research Citations on IPCC Assessment Reports(IEEE, 2016-12-05) Sleeman, Jennifer; Halem, Milton; Finin, Tim; Cane, MarkA common Big Data problem is the need to integrate large temporal data sets from various data sources into one comprehensive structure. Having the ability to correlate evolving facts between data sources can be especially useful in supporting a number of desired application functions such as inference and influence identification. As a real world application we use climate change publications based on the Intergovernmental Panel on Climate Change, which publishes climate change assessment reports every five years, with currently over 25 years of published content. Often these reports reference thousands of research papers. We use dynamic topic modeling as a basis for combining report and citation domains into one structure. We are able to correlate documents between the two domains to understand how the research has influenced the reports and how this influence has changed over time. In this use case, the topic report model used a total number of 410 documents and 5911 terms in the vocabulary while in the topic citations the vocabulary consisted of 25,154 terms and the number of documents was closer to 200,000 research papers.Item Interpreting medical tables as linked data for generating meta-analysis reports(IEEE, 2014-08-15) Mulwad, Varish; Finin, Tim; Joshi, AnupamEvidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta-analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta-analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.Item Multimodal Data Fusion: An Overview of Methods, Challenges and Prospects(IEEE, 2015) Lahat, Dana; Adali, Tulay; Jutten, ChristianIn various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term “modality” for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or “challenges,” are common to multiple domains. This paper deals with two key issues: “why we need data fusion” and “how we perform it.” The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, “diversity” is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.Item Ontology Pattern Modeling for Cross-Repository Data Integration in the Ocean Sciences: The Oceanographic Cruise Example(IOS Press, 2015-06-01) Krisnadhi, Adila A.; Arko, Robert; Carbotte, Suzanne; Chandler, Cynthia; Cheatham, Michelle; Finin, Tim; Hitzler, Pascal; Janowicz, Krzysztof; Narock, Thomas; Raymond, Lisa; Wiebe, PeterEarthCube is a major effort of the National Science Foundation to establish a next-generation knowledge architecture for the broader geosciences. Data storage, retrieval, access, and reuse are central parts of this new effort. Currently, EarthCube is organized around several building blocks and research coordination networks. OceanLink is a semantics-enabled building block that aims at improving data retrieval and reuse via ontologies, Semantic Web technologies, and Linked Data for the ocean sciences. Cruises, in the sense of research expeditions, are central events for ocean scientists. Consequently, information about these cruises and the involved vessels is of primary interest for oceanographers, and thus, needs to be shared and made retrievable. In this paper, we report the use of a design pattern-centric strategy to model Cruise for OceanLink data integration. We provide a formal axiomatization of the introduced pattern using the Web Ontology Language, explain design choices and discuss the planned deployment and application scenarios of our model.