Stratified Neural Models for Document Classification

Author/Creator ORCID

Date

2020-01-20

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

Document classification is an abstract task in the domain of natural language processing and information retrieval. There are traditional methods associated with this task, our method shows the performance enhancement in terms of the performance, convergence and enrichment of information. We propose a hybrid neural language modelling architecture that constructs hierarchical feature representations. We examine our architecture through document classification. In our first model, we begin with a character level convolutional neural layer (CNN) to get word-level representation, next layers recurrent neural network (RNN) with attention-based feature merging in order to get sentence level representation and again we have RNN with attention layer to get document level representation and finally, we have interconnected dense structure stacked to classify documents with soft-max activation. We extend this model to the word level and summarize the overall results and comparisons with baseline models. We show evidence of the hypotheses on multiple datasets, utilizing IMDB YELP review datasets. We show extended results with all datasets in terms of performance with F1 score, accuracy, precision and recall. Also, we show the comparison of convergence time and the rate of convergence of our approach. Moreover, we show visual evidence that our approach leads to better feature construction and able to construct features for 99% of the effective word vocabulary from the characters in the documents.