CLASSIFICATION AND PREDICTION OF NEWSPAPER ARTICLES ON THE BASIS OF AUTHOR GENDER

Nicholas, CharlesSingh, Devisha2021-01-292021-01-292018-01-0111824http://hdl.handle.net/11603/20755Categorizing text on the basis of author gender has been a long standing problem in the field of Machine Learning, taking gender as a basis for classification in different types of text. For the purpose of this theses we focus on categorizing newspaper articles on the basis of gender, traditional machine learning techniques for classifying the text having been applied. Male and female writing styles have been identified. The New York Times Annotated Corpus licensed by Linguistic Data Consortium, containing approximately 1.8 million articles has been used. The article text is sorted, ---articles containing definite male female author bylines and labels have been considered for classification and prediction initially, The text contains name of the author which has been matched against a male female labelled list to determine the gender of the author name. We try to predict the author of the authorless articles (containing articles written by collective boards such as editorials) on the basis of the model we built. We also conduct a comparative study of different machine learning techniques like logistic Regression, Decision Tree Classifier, Support Vector machines and a few more to determine which learning method performs the best with the corpus.application:pdfCLASSIFICATION AND PREDICTION OF NEWSPAPER ARTICLES ON THE BASIS OF AUTHOR GENDERText