Vector Space Representations of Executable Code

dc.contributor.advisorNicholas, Charles
dc.contributor.authorBrandon, Robert
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2019-10-11T13:43:00Z
dc.date.available2019-10-11T13:43:00Z
dc.date.issued2017-01-01
dc.description.abstractModeling executable code in a way that is amenable to machine learning and automated analysis is important for a variety of problems. Current solutions are frequently ad-hoc, with hand-selected features and problem specific models being the standard. Vector space models are frequently applied to a variety of problem areas. This work demonstrates a way to generate dense vector embeddings of executable functions based on their composition. These models can be used to compare functions using standard distance metrics. These vectors are also easily used for a variety of machine learning tasks. A new data set focused on building general purpose representations of executable code, MAML, is used to build these models. Evaluating embeddings is currently an open area of research. Vector space embeddings are considered good if they work for some specific task, but there are no standard criteria for evaluating general purpose embeddings. We propose a set of criteria for evaluating generic code models in a standard way. Vector space models perform comparably with current state-of-the-art specialized models on these evaluations without needing specialized model development.
dc.genredissertations
dc.identifierdoi:10.13016/m27tj8-yo8b
dc.identifier.other11631
dc.identifier.urihttp://hdl.handle.net/11603/15499
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Brandon_umbc_0434D_11631.pdf
dc.subjectCode Analysis
dc.subjectLSTM
dc.subjectMachine Learning
dc.titleVector Space Representations of Executable Code
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Brandon_umbc_0434D_11631.pdf
Size:
4.12 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Brandon_Open.pdf
Size:
57.02 KB
Format:
Adobe Portable Document Format
Description: