Multimodal Unlearning Across Vision, Language, Video, and Audio: Survey of Methods, Datasets, and Benchmarks

dc.contributor.authorSarwar, Nobin
dc.contributor.authorRoy Dipta, Shubhashis
dc.contributor.authorLiu, Zheyuan
dc.contributor.authorPatil, Vaidehi
dc.date.accessioned2026-02-03T18:14:24Z
dc.date.issued2026
dc.description.abstractWith the growing adoption of VLMs, DMs, LLMs, and AFMs, these multimodal foundation models can inadvertently encode sensitive, copyrighted, biased, or unsafe cross-modal associations that originate from their training data. Retraining after deletion requests or policy updates is often impractical, and targeted forgetting remains difficult because knowledge is distributed across shared representations. Multimodal unlearning addresses this challenge by enabling selective removal across modalities while retaining overall utility. This survey offers a unified, system-oriented view of multimodal unlearning across vision, language, audio, and video, grounded in recent advances, emerging applications, and open problems. Our taxonomy enables systematic comparison across model architectures and modalities, clarifying trade-offs among deletion strength, retention, efficiency, reversibility, and robustness. This survey highlights open problems and practical considerations to support future research and deployment of multimodal unlearning.
dc.format.extent29 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2az4x-c1sv
dc.identifier.urihttp://hdl.handle.net/11603/41605
dc.language.isoen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectUMBC Interactive Robotics and Language Lab
dc.titleMultimodal Unlearning Across Vision, Language, Video, and Audio: Survey of Methods, Datasets, and Benchmarks
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9176-1782

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multimodal_Unlearning_Survey.pdf
Size:
2.5 MB
Format:
Adobe Portable Document Format