Robust semantic text similarity using LSA, machine learning, and linguistic resources

dc.contributor.authorKashyap, Abhay L.
dc.contributor.authorHan, Lushan
dc.contributor.authorYus, Roberto
dc.contributor.authorSleeman, Jennifer
dc.contributor.authorSatyapanich, Taneeya
dc.contributor.authorGandhi, Sunil
dc.contributor.authorFinin, Tim
dc.date.accessioned2023-08-02T21:49:37Z
dc.date.available2023-08-02T21:49:37Z
dc.date.issued2015-10-30
dc.description.abstractSemantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.en_US
dc.description.sponsorshipThis research was supported by awards 1228198, 1250627 and 0910838 from the US National Science Foundation. We would like to thank the anonymous reviewers for their valuable comments on an earlier version of this paper.en_US
dc.description.urihttps://link.springer.com/article/10.1007/s10579-015-9319-2en_US
dc.format.extent33 pagesen_US
dc.genrejournal articlesen_US
dc.genrepreprintsen_US
dc.identifierdoi:10.13016/m2y1g3-5jcj
dc.identifier.citationKashyap, A., Han, L., Yus, R. et al. Robust semantic text similarity using LSA, machine learning, and linguistic resources. Lang Resources & Evaluation 50, 125–161 (2016). https://doi.org/10.1007/s10579-015-9319-2en_US
dc.identifier.urihttps://doi.org/10.1007/s10579-015-9319-2
dc.identifier.urihttp://hdl.handle.net/11603/29044
dc.language.isoen_USen_US
dc.publisherSpringeren_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.titleRobust semantic text similarity using LSA, machine learning, and linguistic resourcesen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0002-2624-6223en_US
dcterms.creatorhttps://orcid.org/0000-0002-9311-954Xen_US
dcterms.creatorhttps://orcid.org/0000-0002-6593-1792en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
777.pdf
Size:
1.56 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: