Data Science Employment Classification using Machine Learning

Chandrashekar, Tejus

Data Science Employment Classification using Machine Learning

dc.contributor.advisor	Nicholas, Charles
dc.contributor.author	Chandrashekar, Tejus
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Computer Science
dc.date.accessioned	2021-01-29T18:13:34Z
dc.date.available	2021-01-29T18:13:34Z
dc.date.issued	2019-01-01
dc.description.abstract	Following the gold rush in artificial intelligence, a new career track called "data scientists” has taken the world by storm. With a combination of skills in business intuition and technical soundness, data science is considered the most sought after job in the 21st century. But one must be able to classify if a job posting is a data science-related job or not. This theses aims to classify a job posting whether it belongs to Data Science field or not using a Machine Learning model. Based on the results obtained an extensive analysis is done to find out various patterns and to find out if data science is actually in-demand as one might think. The Machine Learning models used for the classifying the job advertisements are Support Vector Machine and Neural-Networks with TensorFlow. These two models were considered because, first with respect to SVM, it has a regularization parameter, which makes the user think about avoiding over-fitting. Next, it uses the kernel trick, so one can build in expert knowledge about the problem via engineering the kernel. Also, an SVM is defined by a convex optimization problem (no local minima) for which there are efficient methods (e.g. Sequential minimal optimization). Lastly, it approximates a bound on the test error rate, and there is a substantial body of theory behind it which suggests it should be a good idea. Coming to Neural Networks, it has a relatively simple learning algorithm (Stochastic Gradient Descent and backpropagation) when compared to some of the Bayesian models. It also scales well to larger datasets with a new general-purpose GPU hardware and CUDA software that is readily available. And finally, it can significantly out-perform other models when the right conditions and parameters are plugged in appropriately along with high quality labeled data. The dataset is obtained through online web scraping of the Glassdoor website and it is then subjected to pre-processing and feature extraction process. This data is then used to train the above-mentioned models against a training size of around 8000 job advertisements and a test sample of 2000 job advertisements. The results are tabulated in the form of a confusion matrix and the accuracies between the two models are compared.
dc.format	application:pdf
dc.genre	theses
dc.identifier	doi:10.13016/m2xxmy-qamh
dc.identifier.other	12019
dc.identifier.uri	http://hdl.handle.net/11603/20872
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Chandrashekar_umbc_0434M_12019.pdf
dc.title	Data Science Employment Classification using Machine Learning
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chandrashekar_umbc_0434M_12019.pdf
Size:: 1.01 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: ChandrashekarTData_Open.pdf
Size:: 44.84 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Computer Science and Electrical Engineering Department
UMBC Graduate School
UMBC Student Collection