Robust semantic text similarity using LSA, machine learning, and linguistic resources

Kashyap, Abhay L.; Han, Lushan; Yus, Roberto; Sleeman, Jennifer; Satyapanich, Taneeya; Gandhi, Sunil; Finin, Tim

Robust semantic text similarity using LSA, machine learning, and linguistic resources

dc.contributor.author	Kashyap, Abhay L.
dc.contributor.author	Han, Lushan
dc.contributor.author	Yus, Roberto
dc.contributor.author	Sleeman, Jennifer
dc.contributor.author	Satyapanich, Taneeya
dc.contributor.author	Gandhi, Sunil
dc.contributor.author	Finin, Tim
dc.date.accessioned	2023-08-02T21:49:37Z
dc.date.available	2023-08-02T21:49:37Z
dc.date.issued	2015-10-30
dc.description.abstract	Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines latent semantic analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of different lengths, handling informal words and phrases, and matching words with sense definitions. In the SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence–Phrase, Phrase–Word, and Word–Sense subtasks and second in the Paragraph–Sentence subtask.	en
dc.description.sponsorship	This research was supported by awards 1228198, 1250627 and 0910838 from the US National Science Foundation. We would like to thank the anonymous reviewers for their valuable comments on an earlier version of this paper.	en
dc.description.uri	https://link.springer.com/article/10.1007/s10579-015-9319-2	en
dc.format.extent	33 pages	en
dc.genre	journal articles	en
dc.genre	preprints	en
dc.identifier	doi:10.13016/m2y1g3-5jcj
dc.identifier.citation	Kashyap, A., Han, L., Yus, R. et al. Robust semantic text similarity using LSA, machine learning, and linguistic resources. Lang Resources & Evaluation 50, 125–161 (2016). https://doi.org/10.1007/s10579-015-9319-2	en
dc.identifier.uri	https://doi.org/10.1007/s10579-015-9319-2
dc.identifier.uri	http://hdl.handle.net/11603/29044
dc.language.iso	en	en
dc.publisher	Springer	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.	en
dc.title	Robust semantic text similarity using LSA, machine learning, and linguistic resources	en
dc.type	Text	en
dcterms.creator	https://orcid.org/0000-0002-2624-6223	en
dcterms.creator	https://orcid.org/0000-0002-9311-954X	en
dcterms.creator	https://orcid.org/0000-0002-6593-1792	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 777.pdf
Size:: 1.56 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection
UMBC Student Collection