Evaluating Machine Learning based Malware Classifiers

Author/Creator

Author/Creator ORCID

Date

2020-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

In recent years, there has been a significant growth in the number of new malware specimens. This resulted in novel Malware Classifiers to help identify them. Many of these Malware Classifiers claim that they use some form of Machine Learning techniques to identify malware. Our task is to evaluate such claims of MLMC's as to what extent they are true and find out if MLMC's are good at identifying new malware that has never been seen before. We have explored the idea of including diversity into the malware specimen which are generated or compiled from the source code collected from different resources like theZoo and the malsource dataset. Different transformations are performed on the source code. Experiments were done with 1-2 compiled malware specimen, applying transformations on source code level and the VirusTotal's response to these transformations.