Instance-based Sentence Boundary Determination by Optimization for Natural Language Generation

Department

Program

Citation of Original Publication

Pan, Shimei, and James Shaw. “Instance-Based Sentence Boundary Determination by Optimization for Natural Language Generation.” In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL`05), edited by Kevin Knight, Hwee Tou Ng, and Kemal Oflazer, 565–72. Ann Arbor, Michigan: Association for Computational Linguistics, 2005. https://doi.org/10.3115/1219840.1219910.

Rights

Attribution-NonCommercial-ShareAlike 3.0 Unported CC BY-NC-SA 3.0

Subjects

Abstract

This paper describes a novel instancebased sentence boundary determination method for natural language generation that optimizes a set of criteria based on examples in a corpus. Compared to existing sentence boundary determination approaches, our work offers three significant contributions. First, our approach provides a general domain independent framework that effectively addresses sentence boundary determination by balancing a comprehensive set of sentence complexity and quality related constraints. Second, our approach can simulate the characteristics and the style of naturally occurring sentences in an application domain since our solutions are optimized based on their similarities to examples in a corpus. Third, our approach can adapt easily to suit a natural language generation system’s capability by balancing the strengths and weaknesses of its subcomponents (e.g. its aggregation and referring expression generation capability). Our final evaluation shows that the proposed method results in significantly better sentence generation outcomes than a widely adopted approach.