Uncertainty for Malware Detection and Cyber Defense

Author/Creator

Author/Creator ORCID

Date

2021-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

As organizations in government and industry increasingly rely on digitized data and networked computer systems, they face a growing risk of exposure to cyber attacks. As computer networks grow in size, so do the challenges cybersecurity professionals face in securing them. With more connected devices, more users, and more complex systems, adversarial attack opportunities increase exponentially. Recently, the collection and release of malware datasets has allowed for the development of machine learning (ML) approaches to detect malware. Existing ML based approaches to malware detection have not yet leveraged uncertainty in a systematic manner. Cybersecurity intrinsically requires operating under uncertain conditions, so ignoring uncertainty is undesirable. In this thesis, we explore different ways uncertainty estimation can benefit cyber defense. In particular, we demonstrate how taking into account uncertainty can be especially beneficial for highly constrained and quickly evolving malware detection use cases, laying the groundwork for the increased adoption of uncertainty aware ML in the cybersecurity community. Leveraging uncertainty, we improve malware detection rates under extreme false positive rate constraints, improve out of distribution data detection approaches, and significantly reduce the amount of compute time needed to take advantage of the benefits of dynamic analysis. Along the way, we also illustrate why previous evaluation metrics can be misleading and demonstrate that executable file capabilities can be accurately predicted from raw byte sequences.