Evaluating Malware Classifiers on Unknown Malware Families

Author/Creator ORCID

Date

2022-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

Due to the number of daily malware attacks, we have been relying on machinelearning to detect them. Lots of people sell systems that claim to do this, which we refer to as malware classifiers. Evaluating malware classifiers can be tricky. There are many types of malware classifiers, each of which has its purpose. The purpose may be to classify whether a given specimen was malicious or benign, or it may be to classify the malware by its family name, or it may be something else. Nevertheless, for any of these purposes, it has been noted that the malware classifier evaluated similar data on which it was trained. By similar data here, we mean that the training and testing data of the malware classifier included malware samples from similar families. After some false starts, we built a benchmark that can be used to evaluate malware classifiers, even when confronted with malware that they had not seen before.