Stratified Neural Models for Document Classification

dc.contributor.advisorFerraro, Francis
dc.contributor.authorMehta, Sarthak Mayur
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:13Z
dc.date.available2021-09-01T13:55:13Z
dc.date.issued2020-01-20
dc.description.abstractDocument classification is an abstract task in the domain of natural language processing and information retrieval. There are traditional methods associated with this task, our method shows the performance enhancement in terms of the performance, convergence and enrichment of information. We propose a hybrid neural language modelling architecture that constructs hierarchical feature representations. We examine our architecture through document classification. In our first model, we begin with a character level convolutional neural layer (CNN) to get word-level representation, next layers recurrent neural network (RNN) with attention-based feature merging in order to get sentence level representation and again we have RNN with attention layer to get document level representation and finally, we have interconnected dense structure stacked to classify documents with soft-max activation. We extend this model to the word level and summarize the overall results and comparisons with baseline models. We show evidence of the hypotheses on multiple datasets, utilizing IMDB YELP review datasets. We show extended results with all datasets in terms of performance with F1 score, accuracy, precision and recall. Also, we show the comparison of convergence time and the rate of convergence of our approach. Moreover, we show visual evidence that our approach leads to better feature construction and able to construct features for 99% of the effective word vocabulary from the characters in the documents.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2crpv-5qxz
dc.identifier.other12140
dc.identifier.urihttp://hdl.handle.net/11603/22806
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Mehta_umbc_0434M_12140.pdf
dc.subjectMachine Learning
dc.subjectNatural Language Processing
dc.titleStratified Neural Models for Document Classification
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mehta_umbc_0434M_12140.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mehta-Sarthak_Open.pdf
Size:
237.8 KB
Format:
Adobe Portable Document Format
Description: