Roy, ArpitaPark, YoungjaLee, TaesungPan, Shimei2025-01-082025-01-082019-11Roy, Arpita, Youngja Park, Taesung Lee, and Shimei Pan. “Supervising Unsupervised Open Information Extraction Models.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, 728–37. Hong Kong, China: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/D19-1067.https://doi.org/10.18653/v1/D19-1067http://hdl.handle.net/11603/37200Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, November, 2019.We propose a novel supervised open information extraction (Open IE) framework that leverages an ensemble of unsupervised Open IE systems and a small amount of labeled data to improve system performance. It uses the outputs of multiple unsupervised Open IE systems plus a diverse set of lexical and syntactic information such as word embedding, part-of-speech embedding, syntactic role embedding and dependency structure as its input features and produces a sequence of word labels indicating whether the word belongs to a relation, the arguments of the relation or irrelevant. Comparing with existing supervised Open IE systems, our approach leverages the knowledge in existing unsupervised Open IE systems to overcome the problem of insufficient training data. By employing multiple unsupervised Open IE systems, our system learns to combine the strength and avoid the weakness in each individual Open IE system. We have conducted experiments on multiple labeled benchmark data sets. Our evaluation results have demonstrated the superiority of the proposed method over existing supervised and unsupervised models by a significant margin.10 pagesen-USAttribution 4.0 International CC BY 4.0https://creativecommons.org/licenses/by/4.0/Supervising Unsupervised Open Information Extraction ModelsText