Keyphrase Extraction for Technical Language Processing

dc.contributor.authorDima, Alden
dc.contributor.authorMassey, Aaron
dc.date.accessioned2022-03-29T16:55:31Z
dc.date.available2022-03-29T16:55:31Z
dc.date.issued2021-03-09
dc.description.abstractKeyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui’s F-measure of 18.8 %.en_US
dc.description.sponsorshipThe authors thank the NIST Thermodynamics Research Center for providing the data needed for this work and for answering data-related questions. We also thank Thurston Sexton of the NIST Engineering Laboratory for answering Nestor-related questions.en_US
dc.description.urihttps://nvlpubs.nist.gov/nistpubs/jres/126/jres.126.053.pdfen_US
dc.format.extent21 pagesen_US
dc.genrejournal articlesen_US
dc.identifierdoi:10.13016/m2utt6-imec
dc.identifier.citationDima A, Massey A (2021) Keyphrase Extraction for Technical Language Processing. J Res Natl Inst Stan 126:126053. https://doi.org/10.6028/jres.126.053.en_US
dc.identifier.urihttps://doi.org/10.6028/jres.126.053
dc.identifier.urihttp://hdl.handle.net/11603/24456
dc.language.isoen_USen_US
dc.publisherNISTen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis is a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.en_US
dc.rightsPublic Domain Mark 1.0*
dc.rights.urihttp://creativecommons.org/publicdomain/mark/1.0/*
dc.titleKeyphrase Extraction for Technical Language Processingen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0002-4698-6034en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
jres.126.053.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: