Utterance classification in speech-to-speech translation for zero-resource languages in the hospital administration domain
Loading...
Links to Files
Author/Creator ORCID
Date
2015-12
Type of Work
Department
Program
Citation of Original Publication
Martin, Lara J., Andrew Wilkinson, Sai Sumanth Miryala, Vivian Robison, and Alan W Black. "Utterance Classification in Speech-to-Speech Translation for Zero-Resource Languages in the Hospital Administration Domain". 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2015, 303–9. https://doi.org/10.1109/ASRU.2015.7404809.
Rights
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract
Although substantial progress has been achieved in speech-to-speech translation systems over the last few years, such systems still require that the speech be written in some appropriate orthography. As speech may differ greatly from the standardized written form of a language, it can be non-trivial to collect written data when there is no standard way for it to be represented. This project addresses the problem from the other end and expects that speech alone is available in the target language, and that no (standard or non-standard) orthography exists. It, therefore, treats the acoustic representation of the language as primary and uses language-independent methods to produce a phonetically-related symbolic representation that is then used in the translation system. Thus, the speech translation system is created for the target language as defined by the recording of that language rather than some body of orthographic transcripts. In this work, we are creating an application called APT (Acoustic Patient Translator), which uses a novel scheme of speech recognition and translation within a targeted domain. By working with a set of predefined sentences appropriately chosen to fit a scenario, we use utterance classification as a speech recognition algorithm. The utterance classification is achieved using cross-lingual, language-independent phonetic labeling. Since we are working with a set of select phrases, the translation part is trivial. We are concentrating on communication with hospital staff, such as scheduling a doctor's appointment, as our domain. In addition to English, we also run experiments on Tamil.