Detection of near duplicate threads on online question & answer forums.

dc.contributor.advisorNicholas, Charles
dc.contributor.authorChaudhari, Sushant
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-01-29T18:13:34Z
dc.date.available2021-01-29T18:13:34Z
dc.date.issued2018-01-01
dc.description.abstractThe number of questions asked on question and answer (Q&A) forums like Stack Overflow, Quora, and Twitter, is increasing rapidly. Millions of users visit these sites each month and post their questions. It is no surprise that many of these questions are duplicates. Users may have to wait for a long time to get answers to their questions even though related questions have already been answered. So, it is important to have an automatic way of identifying duplicate threads. On Stack Overflow, users with higher reputations mark questions as duplicate, which are then forwarded to moderators who decide if a question is a duplicate or not. Quora, on the other hand, uses a Random Forest model to identify duplicate questions. In this research, we have built a ML model using word2vec from Gensim, trained on Google's 3 million word news dataset; and Long Short-Term Memory networks (LSTMs), which is a deep learning technique. The trained model performs well, predicting duplicate threads with an accuracy of 84.15% in the experiments. The deep learning model outperforms the traditional machine learning models in terms of accuracy and speed. This model will make it easier to find high quality answers to questions, resulting in an improved experience for Q&A writers, seekers, and readers.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2uvqq-bfi6
dc.identifier.other11947
dc.identifier.urihttp://hdl.handle.net/11603/20873
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Chaudhari_umbc_0434M_11947.pdf
dc.titleDetection of near duplicate threads on online question & answer forums.
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chaudhari_umbc_0434M_11947.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ChaudariSDetection_Open.pdf
Size:
44.37 KB
Format:
Adobe Portable Document Format
Description: