Keyphrase Extraction for Technical Language Processing
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2021-03-09
Type of Work
Department
Program
Citation of Original Publication
Dima A, Massey A (2021) Keyphrase Extraction for Technical Language Processing. J Res Natl Inst Stan 126:126053. https://doi.org/10.6028/jres.126.053.
Rights
This is a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain Mark 1.0
Public Domain Mark 1.0
Subjects
Abstract
Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language
processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined
TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers
suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our
toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer,
a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal
of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center
(TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided
keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was
competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui
automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the
SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui’s
F-measure of 18.8 %.