Pixel-Level Scene Recognition Under Diverse Constraints

dc.contributor.advisorRoy, Nirmalya
dc.contributor.authorAhmed, Masud
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2025-07-18T17:08:41Z
dc.date.issued2025-01-01
dc.description.abstractRecent advances in computer vision have significantly enhanced tasks such as object recognition and semantic segmentation, thereby enabling a myriad of applications in smart cities, autonomous driving, medical diagnostics, and robotics. Convolutional neural networks (CNNs) have achieved remarkable success through supervised learning; however, their performance often deteriorates when confronted with substantial domain shifts between training and real-world deployment environments. Unsupervised domain adaptation (UDA) seeks to bridge this gap by exploiting labeled source data along with unlabeled target data, yet these methods typically reach a performance plateau when the domain discrepancy is too large. Fine-tuning with a small, carefully selected subset of target data emerges as a promising strategy to overcome these limitations while reducing the burden of extensive manual annotation. In this work, we first address the fine-tuning challenge within a CNN-based framework by actively sampling high-uncertainty regions from target images and employing continual learning techniques to adapt the model incrementally. Recognizing the inherent limitations of CNNs in capturing complex and nuanced variations in real-world data, we propose a novel transformer-based semantic segmentation approach that operates in a continuous embedding space. Unlike conventional vector quantization methods that depend on discrete embeddings, our framework leverages continuous embeddings using an autoregressive (AR) generative model guided by a diffusion loss. This approach synergistically combines a CNN-based encoder for local feature extraction, a diffusion-based AR transformer to capture long-range dependencies, and a CNN-based decoder to reconstruct detailed pixel-level segmentation masks. Extensive experiments conducted on public datasets such as GTAV, Cityscapes, SemanticKITTI, ACDC, as well as our own CADEdgeTune dataset—characterized by low-angle, real-world imagery—demonstrate that our model attains impressive zero-shot domain adaptation performance. It achieves robust segmentation under adverse weather conditions and varied viewpoints, while also exhibiting strong resilience against noise. Future work will extend these concepts to LiDAR-based semantic segmentation and explore the design of large vision models that fully exploit continuous embedding representations.
dc.formatapplication:pdf
dc.genredissertation
dc.identifierdoi:10.13016/m2b5gv-hapx
dc.identifier.other13035
dc.identifier.urihttp://hdl.handle.net/11603/39424
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Ahmed_umbc_0434D_13035.pdf
dc.subjectActive Learning
dc.subjectContinual Learning
dc.subjectContinuous Autoregressive Model
dc.subjectDomain Adaptation
dc.subjectSemantic Segmentation
dc.titlePixel-Level Scene Recognition Under Diverse Constraints
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmed_umbc_0434D_13035.pdf
Size:
27.86 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmed-Masud_1154788_Open.pdf
Size:
288.46 KB
Format:
Adobe Portable Document Format
Description: