CLUSTERING API CALL GRAPHS THROUGH GAUSSIAN MIXTURE MODELS FOR BINARY BEHAVIORAL STATIC ANALYSIS

Adangwa, Mariama

CLUSTERING API CALL GRAPHS THROUGH GAUSSIAN MIXTURE MODELS FOR BINARY BEHAVIORAL STATIC ANALYSIS

dc.contributor.advisor	Nicholas, Charles
dc.contributor.author	Adangwa, Mariama
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Computer Science
dc.date.accessioned	2021-09-01T13:55:28Z
dc.date.available	2021-09-01T13:55:28Z
dc.date.issued	2019-01-01
dc.description.abstract	Analyzing binaries and malicious programs e?ciently has been a topic of great interest over the past decade. Over the years, tools have been developed to auto mate the process of analyzing binary behavior both statically and dynamically. This project will present a method of analyzing PE32 binaries and malware behavior stat ically through the extraction of Windows API call graphs. These call graphs are then converted into variable-length sequences which are then processed through an enhanced version of the Lempel-Ziv Jaccard Distance (LZJD) algorithm. Princi pal Component Analysis (PCA) is then used to reduce the dimensionality of the resulting vector, and the reduced vector is then soft clustered using Gaussian Mix ture models and quali?ed using silhouette score calculations to create a probabilistic model of the sample(s) in question. The overall concept of this work is by clustering unknown samples with known samples that have behavioral reports, various behaviors can be inferred based on the clustering assignments and silhouette scores. The motivation behind this ap proach is to reduce the amount of manual inspections of behavioral indicators that may infer a speci?c behavior, and to improve the partial automation of the reverse engineering process by developing a behavioral analysis method that requires no program execution, and allows the attributes of the call graph of the program to self-produce behavioral inferences through clustering. Rather than analyzing the disassembled code of a binary, instead Gaussian clusters are analyzed and behav ioral inferences are derived from the resulting clusters. In addition, this method does not require the often arduous task of labeling samples by one or more speci?c behaviors to then train a model that is only limited to those features. One impor tant note for this work is that the distinguishing factor from malware from goodware is the context and intent of the sample. While this work does not aim to identify directly whether an unknown sample is malware, the work does aim to present the behaviors of samples in a cluster format by which an analyst can then infer from the deduced behavior that a sample is benign or malicious as a use case.
dc.format	application:pdf
dc.genre	theses
dc.identifier	doi:10.13016/m28jba-jwwn
dc.identifier.other	12113
dc.identifier.uri	http://hdl.handle.net/11603/22845
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Adangwa_umbc_0434M_12113.pdf
dc.subject	API Call Graph
dc.subject	Behavior
dc.subject	Binary analysis
dc.subject	Clustering
dc.subject	Gaussian Mixture Models
dc.subject	Malware
dc.title	CLUSTERING API CALL GRAPHS THROUGH GAUSSIAN MIXTURE MODELS FOR BINARY BEHAVIORAL STATIC ANALYSIS
dc.type	Text
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Adangwa_umbc_0434M_12113.pdf
Size:: 464.41 KB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Theses and Dissertations
UMBC Computer Science and Electrical Engineering Department
UMBC Graduate School
UMBC Student Collection