KnowledgeWorks@UBalt

Permanent URI for this community

http://hdl.handle.net/11603/12

KnowledgeWorks@UBalt accepts scholarly material created by students, staff, and faculty members of the University of Baltimore community. Graduate students are required to submit their finished thesis or dissertation, while staff and faculty can upload completed academic work to enhance their global visibility on the web. A wide variety of scholarly materials are accepted in any file format.

If you would like to submit material, your first step is to register as a KnowledgeWorks@UBalt member.

Once you have registered, a KnowledgeWorks team member will contact you with instructions on how to upload your content into the UBalt Faculty Scholarship Collection.

For more information, contact knowledgeworks@ubalt.edu.

Browse

Now showing 1 - 1 of 1

Statistical Unigram Analysis for Source Code Repository
Xu, Weifeng; Xu, Dianxiang; Ariss, Omar El; Liu, Yunkai; Alatawi, Abdularaham; School of Criminal Justice; Computer Science
Unigram is a fundamental element of n-gram in natural language processing. However, unigrams collected from a natural language corpus are unsuitable for solving problems in the domain of computer programming languages. In this paper, we analyze the properties of unigrams collected from an ultra-large source code repository. Specifically, we have collected 1.01 billion unigrams from 0.7 million open source projects hosted at GitHub.com. By analyzing these unigrams, we have discovered statistical patterns regarding (1) how developers name variables, methods, and classes, and (2) how developers choose abbreviations. Our study describes a probabilistic model for solving a well-known problem in source code analysis: how to expand a given abbreviation to its original indented word. It shows that the unigrams collected from source code repositories are essential resources to solving the domain specific problems.

Browse

Browsing KnowledgeWorks@UBalt by Type "conference paper"

Results Per Page

Sort Options