Multimodal Deep Generative Models for Cross Spectral Image Analysis

Ordun, Catherine

Multimodal Deep Generative Models for Cross Spectral Image Analysis

dc.contributor.advisor	Purushotham, Sanjay
dc.contributor.advisor	Raff, Edward
dc.contributor.author	Ordun, Catherine
dc.contributor.department	Information Systems
dc.contributor.program	Information Systems
dc.date.accessioned	2024-01-10T20:03:57Z
dc.date.available	2024-01-10T20:03:57Z
dc.date.issued	2023-01-01
dc.description.abstract	Thermal images captured in the Long-Wave Infrared (LWIR) spectrum capture insights hidden in visible images due to the visualization of radiated heat. For facial images, LWIR detects signs of inflammation, cognitive stress, and pain, which make it valuable as an alternative to the visible spectrum. While well-studied in the field of thermal physiology, thermal images have yet to be applied towards broad medical screening in telemedicine since computers and smartphones are not equipped with LWIR sensors. As a result, this single technical constraint has prevented wider adoption of cross spectral imagery for medical assessment. To address this limitation, we develop generative algorithms to act as an ``emulator", or AI proxy, for a thermal camera by translating a visible image into its thermal pair. In this thesis, we develop three Visible-to-Thermal (V2T) facial translation algorithms based on Generative Adversarial Networks (GAN) that given a visible image, generates or translates, it into its thermal pair. In particular, the Visible-to-Thermal Facial GAN (VTF-GAN) operates in No-, Low-, and Hard-Light visible settings by learning a Fourier Transform Loss. We also offer the first V2T Facial Diffusion Model (VTF-Diff) that offers promising results, competitive to the VTF-GAN. However, the generation of a thermal face is meaningless if it misconstrues the individual's facial identity. This occurs when VT pairs are misaligned, which is a common occurrence during data collection when practitioners capture images using two different cameras (e.g visible and thermal cameras). As a result, we develop an unsupervised VT image registration algorithm called Vista Morph that incorporates generative flows to learn a deformation field between cross spectral pairs. Our work beats the state-of-the-art and offers the first VT facial application of image registration. We demonstrate through biometric thermal vessel extraction, that V2T translation using Vista Morph retains subject identity better than without. Further, Vista Morph works on automated driving street scene data and is robust to geometric warps and erasure. The generative works of VTF-GAN and Vista Morph culminate in its application on a real-life medical dataset called Intelligent Sight & Sound (ISS), a clinical trial of cancer patient pain. In collaboration with the U.S. National Institutes of Health (NIH), we trained our models on 29,500 VT cancer facial datasets, demonstrating that our approaches succeed under spontaneous settings, challenging head poses, poor resolution, and weak lighting conditions. To augment this work, we also conducted a deep dive into the NIH ISS dataset introducing it as the first of its kind. We proved its utility by developing several multimodal pain detection models to predict chronic cancer pain, a far more challenging scenario than conventional acute pain detection that exists today.
dc.format	application:pdf
dc.genre	dissertation
dc.identifier.other	12804
dc.identifier.uri	http://hdl.handle.net/11603/31239
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.source	Original File Name: Ordun_umbc_0434D_12804.pdf
dc.subject	Diffusion
dc.subject	Facial Emotion Recognition
dc.subject	Generative Adversarial Network
dc.subject	Generative AI
dc.subject	Image Registration
dc.subject	Thermal Imagery
dc.title	Multimodal Deep Generative Models for Cross Spectral Image Analysis
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ordun_umbc_0434D_12804.pdf
Size:: 71.25 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Ordun-Catherine_Open.pdf
Size:: 284.84 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Graduate School
UMBC Information Systems Department
UMBC Student Collection