Cognitive Intelligence in Relational Databases
Loading...
Links to Files
Permanent Link
Collections
Author/Creator
Author/Creator ORCID
Date
2017-01-01
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Distribution Rights granted to UMBC by the author.
Abstract
We evaluate the applicability of distributed language embedding techniques from the domain of natural language processing to relational data. Relational data is typically stored in SQL databases. We apply modern distributed representations of words (Tomas Mikolov 2013c) and paragraph (Quoc V. Le 2014) techniques to this structured data and attempt to unlock the potential of enhanced cognitive querying. The research intention is to be able to perform queries which are non-trivial to perform using the SQL dialect alone. We tokenize the IMDB 5000 movie data-set to generate embeddings using word2vec and a modified version of doc2vec that we term as row2vec. We discuss the effects of various hyperparameter choices and tokenization techniques. We visualise these embedding using PCA and present the results for certain queries. Keywords: Word embedding, databases, word2vec, cognitive querying.