CLASSIFICATION AND PREDICTION OF NEWSPAPER ARTICLES ON THE BASIS OF AUTHOR GENDER

dc.contributor.advisorNicholas, Charles
dc.contributor.authorSingh, Devisha
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-01-29T18:12:48Z
dc.date.available2021-01-29T18:12:48Z
dc.date.issued2018-01-01
dc.description.abstractCategorizing text on the basis of author gender has been a long standing problem in the field of Machine Learning, taking gender as a basis for classification in different types of text. For the purpose of this theses we focus on categorizing newspaper articles on the basis of gender, traditional machine learning techniques for classifying the text having been applied. Male and female writing styles have been identified. The New York Times Annotated Corpus licensed by Linguistic Data Consortium, containing approximately 1.8 million articles has been used. The article text is sorted, ---articles containing definite male female author bylines and labels have been considered for classification and prediction initially, The text contains name of the author which has been matched against a male female labelled list to determine the gender of the author name. We try to predict the author of the authorless articles (containing articles written by collective boards such as editorials) on the basis of the model we built. We also conduct a comparative study of different machine learning techniques like logistic Regression, Decision Tree Classifier, Support Vector machines and a few more to determine which learning method performs the best with the corpus.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m20opv-8ynh
dc.identifier.other11824
dc.identifier.urihttp://hdl.handle.net/11603/20755
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Singh_umbc_0434M_11824.pdf
dc.titleCLASSIFICATION AND PREDICTION OF NEWSPAPER ARTICLES ON THE BASIS OF AUTHOR GENDER
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Singh_umbc_0434M_11824.pdf
Size:
891.79 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SinghDClassification_Open.pdf
Size:
47.8 KB
Format:
Adobe Portable Document Format
Description: