ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

dc.contributor.authorPatel, Maitreya
dc.contributor.authorGokhale, Tejas
dc.contributor.authorBaral, Chitta
dc.contributor.authorYang, Yezhou
dc.date.accessioned2024-02-27T19:20:44Z
dc.date.available2024-02-27T19:20:44Z
dc.date.issued2024-03-24
dc.descriptionThe Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
dc.description.abstractThe ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/
dc.description.sponsorshipThis work was supported by NSF RI grants #1750082 and #2132724, and a grant from Meta AI Learning Alliance. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the funding agencies and employers.
dc.description.urihttps://ojs.aaai.org/index.php/AAAI/article/view/29371
dc.format.extent21 pages
dc.genreconference papers and proceedings
dc.genrepostprints
dc.identifierdoi:10.1609/aaai.v38i13.29371
dc.identifier.citationPatel, Maitreya, Tejas Gokhale, Chitta Baral, and Yezhou Yang. “ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models.” Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 13 (March 24, 2024): 14554–62. https://doi.org/10.1609/aaai.v38i13.29371.
dc.identifier.urihttps://doi.org/10.1609/aaai.v38i13.29371
dc.identifier.urihttp://hdl.handle.net/11603/31717
dc.publisherAAAI
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.titleConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
29371-Article Text-33425-1-2-20240324 (6).pdf
Size:
17.1 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: