CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
dc.contributor.author | Ahmed, Masud | |
dc.contributor.author | Hasan, Zahid | |
dc.contributor.author | Haque, Syed Arefinul | |
dc.contributor.author | Faridee, Abu Zaher Md | |
dc.contributor.author | Purushotham, Sanjay | |
dc.contributor.author | You, Suya | |
dc.contributor.author | Roy, Nirmalya | |
dc.date.accessioned | 2025-04-23T20:31:09Z | |
dc.date.available | 2025-04-23T20:31:09Z | |
dc.date.issued | 2025-03-19 | |
dc.description.abstract | Traditional transformer-based semantic segmentation relies on quantized embeddings. However, our analysis reveals that autoencoder accuracy on segmentation mask using quantized embeddings (e.g. VQ-VAE) is 8% lower than continuous-valued embeddings (e.g. KL-VAE). Motivated by this, we propose a continuous-valued embedding framework for semantic segmentation. By reformulating semantic mask generation as a continuous image-to-embedding diffusion process, our approach eliminates the need for discrete latent representations while preserving fine-grained spatial and semantic details. Our key contribution includes a diffusion-guided autoregressive transformer that learns a continuous semantic embedding space by modeling long-range dependencies in image features. Our framework contains a unified architecture combining a VAE encoder for continuous feature extraction, a diffusion-guided transformer for conditioned embedding generation, and a VAE decoder for semantic mask reconstruction. Our setting facilitates zero-shot domain adaptation capabilities enabled by the continuity of the embedding space. Experiments across diverse datasets (e.g., Cityscapes and domain-shifted variants) demonstrate state-of-the-art robustness to distribution shifts, including adverse weather (e.g., fog, snow) and viewpoint variations. Our model also exhibits strong noise resilience, achieving robust performance (≈ 95% AP compared to baseline) under gaussian noise, moderate motion blur, and moderate brightness/contrast variations, while experiencing only a moderate impact (≈ 90% AP compared to baseline) from 50% salt and pepper noise, saturation and hue shifts. Code available: this https URL | |
dc.description.sponsorship | This work has been partially supported by U.S. Army Grant #W911NF2120076, U.S. Army Grant #W911NF2410367, ONR Grant #N00014-23-1-2119, NSF CAREER Award #1750936, NSF REU Site Grant #2050999, and NSF CNS EAGER Grant #2233879. | |
dc.description.uri | https://arxiv.org/abs/2503.15617 | |
dc.format.extent | 10 pages | |
dc.genre | journal artciles | |
dc.genre | preprints | |
dc.identifier | doi:10.13016/m2mfur-6ihc | |
dc.identifier.uri | https://doi.org/10.48550/arXiv.2503.15617 | |
dc.identifier.uri | http://hdl.handle.net/11603/38023 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Information Systems Department | |
dc.relation.ispartof | UMBC Center for Real-time Distributed Sensing and Autonomy | |
dc.relation.ispartof | UMBC Student Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.rights | This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law. | |
dc.rights | Public Domain | |
dc.rights.uri | https://creativecommons.org/publicdomain/mark/1.0/ | |
dc.subject | UMBC M | |
dc.subject | UMBC Mobile, Pervasive and Sensor Computing Lab (MPSC Lab) | |
dc.title | CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation | |
dc.type | Text | |
dcterms.creator | https://orcid.org/0000-0002-8495-0948 | |
dcterms.creator | https://orcid.org/0000-0002-8324-1197 |
Files
Original bundle
1 - 1 of 1