Pixel-Level Scene Recognition Under Diverse Constraints

Ahmed, Masud

Pixel-Level Scene Recognition Under Diverse Constraints

dc.contributor.advisor	Roy, Nirmalya
dc.contributor.author	Ahmed, Masud
dc.contributor.department	Information Systems
dc.contributor.program	Information Systems
dc.date.accessioned	2025-07-18T17:08:41Z
dc.date.issued	2025-01-01
dc.description.abstract	Recent advances in computer vision have significantly enhanced tasks such as object recognition and semantic segmentation, thereby enabling a myriad of applications in smart cities, autonomous driving, medical diagnostics, and robotics. Convolutional neural networks (CNNs) have achieved remarkable success through supervised learning; however, their performance often deteriorates when confronted with substantial domain shifts between training and real-world deployment environments. Unsupervised domain adaptation (UDA) seeks to bridge this gap by exploiting labeled source data along with unlabeled target data, yet these methods typically reach a performance plateau when the domain discrepancy is too large. Fine-tuning with a small, carefully selected subset of target data emerges as a promising strategy to overcome these limitations while reducing the burden of extensive manual annotation. In this work, we first address the fine-tuning challenge within a CNN-based framework by actively sampling high-uncertainty regions from target images and employing continual learning techniques to adapt the model incrementally. Recognizing the inherent limitations of CNNs in capturing complex and nuanced variations in real-world data, we propose a novel transformer-based semantic segmentation approach that operates in a continuous embedding space. Unlike conventional vector quantization methods that depend on discrete embeddings, our framework leverages continuous embeddings using an autoregressive (AR) generative model guided by a diffusion loss. This approach synergistically combines a CNN-based encoder for local feature extraction, a diffusion-based AR transformer to capture long-range dependencies, and a CNN-based decoder to reconstruct detailed pixel-level segmentation masks. Extensive experiments conducted on public datasets such as GTAV, Cityscapes, SemanticKITTI, ACDC, as well as our own CADEdgeTune dataset—characterized by low-angle, real-world imagery—demonstrate that our model attains impressive zero-shot domain adaptation performance. It achieves robust segmentation under adverse weather conditions and varied viewpoints, while also exhibiting strong resilience against noise. Future work will extend these concepts to LiDAR-based semantic segmentation and explore the design of large vision models that fully exploit continuous embedding representations.
dc.format	application:pdf
dc.genre	dissertation
dc.identifier	doi:10.13016/m2b5gv-hapx
dc.identifier.other	13035
dc.identifier.uri	http://hdl.handle.net/11603/39424
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Ahmed_umbc_0434D_13035.pdf
dc.subject	Active Learning
dc.subject	Continual Learning
dc.subject	Continuous Autoregressive Model
dc.subject	Domain Adaptation
dc.subject	Semantic Segmentation
dc.title	Pixel-Level Scene Recognition Under Diverse Constraints
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ahmed_umbc_0434D_13035.pdf
Size:: 27.86 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Ahmed-Masud_1154788_Open.pdf
Size:: 288.46 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Graduate School
UMBC Information Systems Department
UMBC Student Collection