Multimodal Deep Generative Models for Cross Spectral Image Analysis

dc.contributor.advisorPurushotham, Sanjay
dc.contributor.advisorRaff, Edward
dc.contributor.authorOrdun, Catherine
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2024-01-10T20:03:57Z
dc.date.available2024-01-10T20:03:57Z
dc.date.issued2023-01-01
dc.description.abstractThermal images captured in the Long-Wave Infrared (LWIR) spectrum capture insights hidden in visible images due to the visualization of radiated heat. For facial images, LWIR detects signs of inflammation, cognitive stress, and pain, which make it valuable as an alternative to the visible spectrum. While well-studied in the field of thermal physiology, thermal images have yet to be applied towards broad medical screening in telemedicine since computers and smartphones are not equipped with LWIR sensors. As a result, this single technical constraint has prevented wider adoption of cross spectral imagery for medical assessment. To address this limitation, we develop generative algorithms to act as an ``emulator", or AI proxy, for a thermal camera by translating a visible image into its thermal pair. In this thesis, we develop three Visible-to-Thermal (V2T) facial translation algorithms based on Generative Adversarial Networks (GAN) that given a visible image, generates or translates, it into its thermal pair. In particular, the Visible-to-Thermal Facial GAN (VTF-GAN) operates in No-, Low-, and Hard-Light visible settings by learning a Fourier Transform Loss. We also offer the first V2T Facial Diffusion Model (VTF-Diff) that offers promising results, competitive to the VTF-GAN. However, the generation of a thermal face is meaningless if it misconstrues the individual's facial identity. This occurs when VT pairs are misaligned, which is a common occurrence during data collection when practitioners capture images using two different cameras (e.g visible and thermal cameras). As a result, we develop an unsupervised VT image registration algorithm called Vista Morph that incorporates generative flows to learn a deformation field between cross spectral pairs. Our work beats the state-of-the-art and offers the first VT facial application of image registration. We demonstrate through biometric thermal vessel extraction, that V2T translation using Vista Morph retains subject identity better than without. Further, Vista Morph works on automated driving street scene data and is robust to geometric warps and erasure. The generative works of VTF-GAN and Vista Morph culminate in its application on a real-life medical dataset called Intelligent Sight & Sound (ISS), a clinical trial of cancer patient pain. In collaboration with the U.S. National Institutes of Health (NIH), we trained our models on 29,500 VT cancer facial datasets, demonstrating that our approaches succeed under spontaneous settings, challenging head poses, poor resolution, and weak lighting conditions. To augment this work, we also conducted a deep dive into the NIH ISS dataset introducing it as the first of its kind. We proved its utility by developing several multimodal pain detection models to predict chronic cancer pain, a far more challenging scenario than conventional acute pain detection that exists today.
dc.formatapplication:pdf
dc.genredissertation
dc.identifier.other12804
dc.identifier.urihttp://hdl.handle.net/11603/31239
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Ordun_umbc_0434D_12804.pdf
dc.subjectDiffusion
dc.subjectFacial Emotion Recognition
dc.subjectGenerative Adversarial Network
dc.subjectGenerative AI
dc.subjectImage Registration
dc.subjectThermal Imagery
dc.titleMultimodal Deep Generative Models for Cross Spectral Image Analysis
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ordun_umbc_0434D_12804.pdf
Size:
71.25 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ordun-Catherine_Open.pdf
Size:
284.84 KB
Format:
Adobe Portable Document Format
Description: