Vector Space Representations of Executable Code
Loading...
Links to Files
Permanent Link
Collections
Author/Creator
Author/Creator ORCID
Date
2017-01-01
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Distribution Rights granted to UMBC by the author.
Abstract
Modeling executable code in a way that is amenable to machine learning and automated analysis is important for a variety of problems. Current solutions are frequently ad-hoc, with hand-selected features and problem specific models being the standard. Vector space models are frequently applied to a variety of problem areas. This work demonstrates a way to generate dense vector embeddings of executable functions based on their composition. These models can be used to compare functions using standard distance metrics. These vectors are also easily used for a variety of machine learning tasks. A new data set focused on building general purpose representations of executable code, MAML, is used to build these models. Evaluating embeddings is currently an open area of research. Vector space embeddings are considered good if they work for some specific task, but there are no standard criteria for evaluating general purpose embeddings. We propose a set of criteria for evaluating generic code models in a standard way. Vector space models perform comparably with current state-of-the-art specialized models on these evaluations without needing specialized model development.