DoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text

dc.contributor.authorHasan, Fatema
dc.contributor.authorLi, Yulong
dc.contributor.authorFoulds, James
dc.contributor.authorPan, Shimei
dc.contributor.authorBhattacharjee, Bishwaranjan
dc.date.accessioned2025-02-13T17:56:10Z
dc.date.available2025-02-13T17:56:10Z
dc.date.issued2024-11-04
dc.descriptionICMI '24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, San Jose Costa Rica, November 4 - 8, 2024
dc.description.abstractTraditional large language models (LLMs) leverage extensive text corpora but lack access to acoustic and para-linguistic cues present in speech. There is a growing interest in enhancing text-based models with audio information. However, current models often require an aligned audio-text dataset which is frequently much smaller than typical language model training corpora. Moreover, these models often require both text and audio streams during inference/testing. In this study, we introduce a novel two-stage knowledge distillation (KD) approach that enables language models to (a) incorporate rich acoustic and paralinguistic information from speech, (b) utilize text corpora comparable in size to typical language model training data, and (c) support text-only analysis without requiring an audio stream during inference/testing. Specifically, we employ a pre-trained speech embedding teacher model (OpenAI Whisper) to train a Teacher Assistant (TA) model on an aligned audio-text dataset in the first stage. In the second stage, the TA’s knowledge is transferred to a student language model trained on a conventional text dataset. Thus, our two-stage KD method leverages both the acoustic and paralinguistic cues in the aligned audio-text data and the nuanced linguistic knowledge in a large text-only dataset. Based on our evaluation, this DoubleDistillation system consistently outperforms traditional LLMs in 15 informal text understanding tasks.
dc.description.urihttps://dl.acm.org/doi/10.1145/3678957.3685705
dc.format.extent10 pages
dc.genreconference papers and proceedings
dc.identifierdoi:10.13016/m28vwa-5zhr
dc.identifier.citationHasan, Fatema, Yulong Li, James R. Foulds, Shimei Pan, and Bishwaranjan Bhattacharjee. “DoubleDistillation: Enhancing LLMs for Informal Text Analysis Using Multistage Knowledge Distillation from Speech and Text.” Proceedings of the 26th International Conference on Multimodal Interaction, ICMI ’24, November 4, 2024, 526–35. https://doi.org/10.1145/3678957.3685705.
dc.identifier.urihttps://doi.org/10.1145/3678957.3685705
dc.identifier.urihttp://hdl.handle.net/11603/37690
dc.language.isoen_US
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Information Systems Department
dc.relation.ispartofUMBC Student Collection
dc.rightsAttribution-NonCommercial 4.0 International CC BY-NC 4.0 Deed
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.titleDoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5989-8543
dcterms.creatorhttps://orcid.org/0000-0001-9722-4243
dcterms.creatorhttps://orcid.org/0000-0003-0935-4182

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
3678957.3685705.pdf
Size:
1.7 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
appendix.pdf
Size:
297.36 KB
Format:
Adobe Portable Document Format