Robust Semantic Text Similarity Using LSA, Machine Learning and Linguistic Resources

Abhay L. Kashyap, Lushan Han, Roberto Yus, Jennifer Sleeman, Taneeya W. Satyapanich, Sunil R Gandhi, and Tim Finin, Robust Semantic Text Similarity Using LSA, Machine Learning and Linguistic Resources, Language Resources and Evaluation March 2016, Volume 50, Issue 1, pp 125–161, http://dx.doi.org/10.1007/s10579-015-9319-2

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
This is a post-peer-review, pre-copyedit version of an article published in Language Resources and Evaluation. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10579-015-9319-2

Subjects

Latent Semantic Analysis
WordNet
term alignment
semantic similarity
UMBC Ebiquity Research Group

Abstract

Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.

Robust Semantic Text Similarity Using LSA, Machine Learning and Linguistic Resources

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract