Oates, TimAthley, Sushant2019-10-112019-10-112017-01-0111692http://hdl.handle.net/11603/15494We evaluate the applicability of distributed language embedding techniques from the domain of natural language processing to relational data. Relational data is typically stored in SQL databases. We apply modern distributed representations of words (Tomas Mikolov 2013c) and paragraph (Quoc V. Le 2014) techniques to this structured data and attempt to unlock the potential of enhanced cognitive querying. The research intention is to be able to perform queries which are non-trivial to perform using the SQL dialect alone. We tokenize the IMDB 5000 movie data-set to generate embeddings using word2vec and a modified version of doc2vec that we term as row2vec. We discuss the effects of various hyperparameter choices and tokenization techniques. We visualise these embedding using PCA and present the results for certain queries. Keywords: Word embedding, databases, word2vec, cognitive querying.This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.educognitive queryingdatabasesword2vecWord embeddingCognitive Intelligence in Relational DatabasesText