Investigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features

dc.contributor.authorKhanjani, Zahra
dc.contributor.authorAle, Tolulope
dc.contributor.authorWang, Jianwu
dc.contributor.authorDavis, Lavon
dc.contributor.authorMallinson, Christine
dc.contributor.authorJaneja, Vandana
dc.date.accessioned2024-10-28T14:30:44Z
dc.date.available2024-10-28T14:30:44Z
dc.date.issued2024-09-09
dc.description.abstractSeveral types of spoofed audio, such as mimicry, replay attacks, and deepfakes, have created societal challenges to information integrity. Recently, researchers have worked with sociolinguistics experts to label spoofed audio samples with Expert Defined Linguistic Features (EDLFs) that can be discerned by the human ear: pitch, pause, word-initial and word-final release bursts of consonant stops, audible intake or outtake of breath, and overall audio quality. It is established that there is an improvement in several deepfake detection algorithms when they augmented the traditional and common features of audio data with these EDLFs. In this paper, using a hybrid dataset comprised of multiple types of spoofed audio augmented with sociolinguistic annotations, we investigate causal discovery and inferences between the discernible linguistic features and the label in the audio clips, comparing the findings of the causal models with the expert ground truth validation labeling process. Our findings suggest that the causal models indicate the utility of incorporating linguistic features to help discern spoofed audio, as well as the overall need and opportunity to incorporate human knowledge into models and techniques for strengthening AI models. The causal discovery and inference can be used as a foundation of training humans to discern spoofed audio as well as automating EDLFs labeling for the purpose of performance improvement of the common AI-based spoofed audio detectors.
dc.description.sponsorshipAuthors would like to acknowledge support from the National Science Foundation Award #2210011. The codes and audio samples are available through our GitHub repository [8].
dc.description.urihttp://arxiv.org/abs/2409.06033
dc.format.extent10 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2o3ti-keat
dc.identifier.citationKhanjani, Zahra, Tolulope Ale, Jianwu Wang, Lavon Davis, Christine Mallinson, and Vandana P. Janeja. “Investigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features,” September 9, 2024. https://doi.org/10.48550/arXiv.2409.06033.
dc.identifier.urihttps://doi.org/10.48550/arXiv.2409.06033
dc.identifier.urihttp://hdl.handle.net/11603/36769
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC GESTAR II
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Center for Real-time Distributed Sensing and Autonomy
dc.relation.ispartofUMBC Office for the Vice President of Research
dc.relation.ispartofUMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartofUMBC Language, Literacy, and Culture Department
dc.relation.ispartofUMBC Information Systems Department
dc.relation.ispartofUMBC Center for Accelerated Real Time Analysis
dc.relation.ispartofUMBC Office of Institutional Advancement
dc.relation.ispartofUMBC Staff Collection
dc.relation.ispartofUMBC Center for Social Science Scholarship
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Student Collection
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International CC BY-NC-ND 4.0 Deed
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectElectrical Engineering and Systems Science - Audio and Speech Processing
dc.subjectComputer Science - Sound
dc.subjectComputer Science - Computation and Language
dc.subjectUMBC Big Data Analytics Lab
dc.subjectUMBC Cybersecurity Institute
dc.titleInvestigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9933-1170
dcterms.creatorhttps://orcid.org/0000-0003-0130-6135

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2409.06033v1.pdf
Size:
902.21 KB
Format:
Adobe Portable Document Format