Evaluating Automatic Malware Classifiers in the Absence of Reference Labels

Author/Creator

Author/Creator ORCID

Date

2020-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

The malware analysis community is completely devoid of a diverse, up to date reference dataset with ground truth labels. Consequentially, it is typical for automatic malware classifiers to be evaluated using custom datasets with near ground truth labels. However, classifier evaluation using near ground truth labels can yield erroneous or biased results. We propose an alternative classifier evaluation framework that does not require reference labels. We introduce the concept of a ground truth refinement and propose potential methods for constructing an approximation of one from a malware dataset. We prove that using a ground truth refinement it is possible to compute lower bounds on precision and error rate as well as upper bounds on recall and accuracy without requiring ground truth reference labels. We perform a case study on the popular AVClass malware labeler using our proposed evaluation framework.