Side Effects of Erasing Concepts from Diffusion Models

Saha, Shaswati; Saha, Sourajit; Gaur, Manas; Gokhale, Tejas

Side Effects of Erasing Concepts from Diffusion Models

dc.contributor.author	Saha, Shaswati
dc.contributor.author	Saha, Sourajit
dc.contributor.author	Gaur, Manas
dc.contributor.author	Gokhale, Tejas
dc.date.accessioned	2025-09-18T14:22:25Z
dc.date.issued	2025-11
dc.description	Findings of the Association for Computational Linguistics EMNLP 2025, November 4-9, 2025, Suzhou, China
dc.description.abstract	Concerns about text-to-image (T2I) generative models infringing on privacy, copyright, and safety have led to the development of Concept Erasure Techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired "target" concepts specified by the user, while preserving the ability to synthesize high-quality images of the remaining concepts. In this work, we demonstrate that CETs can be easily circumvented and present several side effects of concept erasure. For a comprehensive measurement of the robustness of CETs, we present Side Effect Evaluation (SEE), an evaluation benchmark that consists of hierarchical and compositional prompts that describe objects and their attributes. This dataset and our automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy and semantically similar prompts, such as compositional variants of the target. We show that CETs suffer from attribute leakage and counterintuitive phenomena of attention concentration or dispersal. We release our dataset, code, and evaluation tools to aid future work on robust concept erasure.
dc.description.uri	https://aclanthology.org/2025.findings-emnlp.810/
dc.format.extent	17 pages
dc.genre	conference papers and proceedings
dc.identifier	doi:10.13016/m2abpd-mwln
dc.identifier.citation	Saha, Shaswati, Sourajit Saha, Manas Gaur, and Tejas Gokhale. "Side Effects of Erasing Concepts from Diffusion Models". In Findings of the Association for Computational Linguistics: EMNLP 2025, edited by Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng. Association for Computational Linguistics, 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.810.
dc.identifier.uri	https://doi.org/10.18653/v1/2025.findings-emnlp.810
dc.identifier.uri	http://hdl.handle.net/11603/40239
dc.language.iso	en
dc.publisher	ACL
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Student Collection
dc.relation.ispartof	UMBC Information Systems Department
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	UMBC Ebiquity Research Group
dc.subject	Computer Science - Computer Vision and Pattern Recognition
dc.subject	Computer Science - Machine Learning
dc.subject	UMBC KAI2 Knowledge-infused AI and Inference lab
dc.title	Side Effects of Erasing Concepts from Diffusion Models
dc.type	Text
dcterms.creator	https://orcid.org/0000-0003-1357-7813
dcterms.creator	https://orcid.org/0000-0002-5411-2230
dcterms.creator	https://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2025.findings-emnlp.810.pdf
Size:: 9.66 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Faculty Collection
UMBC Computer Science and Electrical Engineering Department
UMBC Information Systems Department
UMBC Student Collection