Similarity Metrics for Code Reuse

Author/Creator

Author/Creator ORCID

Date

2020-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

Most new software includes code that has been used before. Code that is later found to be vulnerable may exist in multiple software applications, but discovering all software containing the vulnerable code is difficult. In the same way, malicious code may resurface in new software. In this case, the ability to recognize reused, malicious code can help determine the trustworthiness of a new software application. Similarity metrics for code reuse can be applied to help solve these kinds of problems. In this theses, a variety of code similarity techniques are surveyed. We closely examine six approaches that can compare code from compiled software. We offer an analysis of the different approaches before performing experiments to measure their performance in comparing functions. We then conclude by comparing the effectiveness of the different approaches and the features from decompiled code they used. Finally, we offer suggestions for future work based on our findings.