A study on language modeling based on deep neural networks
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Towson University. Department of Computer and Information Sciences
Program
Citation of Original Publication
Rights
There are no restrictions on access to this document. An internet release form signed by the author to display this document online is on file with Towson University Special Collections and Archives.
Subjects
Abstract
With the rapid growth in the integration of artificial intelligent (AI) technology in various natural language processing (NLP), the demands on the development within the fields of natural language understanding and natural language generation have been rapidly increasing as well. Both of these techniques analyze language, as it is naturally spoken or written by users, and must contend with a degree of ambiguity not present in formal language. For this reason, language modeling, a statistical approach, has been used as a key role in this area. Recently the emerging field of deep learning, which applies complex deep neural networks for many NLP-based machine learning tasks such as speech recognition, spelling correction, and machine translation, has also been applied to language modeling. This neural network-based language model (or neural language model) has been frequently studied by many researchers and achieved more remarkable results than the traditional language models. In this dissertation, we propose three dedicated approaches to improve the performance of neural language model. The first approach is intended to address a useful implicit feature of textual data. Even though many neural language models have achieved remarkable performance, they only rely upon the analysis on the words that occurred in sentences. Every sentence contains various useful morphological information such as the Part-of-Speech (POS) tag. This information is necessary for constituting a sentence and can also be used for an analysis as a feature. Therefore, we propose a neural language model for multi-dimensional textual data based on convolutional neural network (CNN) and long-short term memory (LSTM). Second, although LSTM has shown reasonable results by memorizing preceding information, it is hard to memorize all of the information of preceding words because LSTM memorizes the history information with a single dimension matrix. Consequently, the history information gradually vanishes as the time step becomes longer. This can cause a vanishing memory problem that can interfere with learning long-term dependency if the distance between words related to each other is long. To compensate for the problem, we propose a method for sharing the value of cell states in LSTM cells using the cell state stacking method. Finally, we customized and applied the attention mechanism to our language model. The attention mechanism has been proposed and achieved state-of-the-art level performance in many NLP tasks. However, the attention mechanism calculates a context vector by simply accumulating the weighted sum of the outputs from the network, which may cause information loss. In addition, many attention-based models tend to require tons of parameters to train their models and to achieve state-of-the-art level performance. Therefore, we propose a neural network-based language model with an extended attention mechanism to achieve comparable performance to the recent state-of-the-art models only with far fewer parameters.