Multimodal Deep Generative Models for Cross Spectral Image Analysis

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Subjects

Diffusion
Facial Emotion Recognition
Generative Adversarial Network
Generative AI
Image Registration
Thermal Imagery

Abstract

Thermal images captured in the Long-Wave Infrared (LWIR) spectrum capture insights hidden in visible images due to the visualization of radiated heat. For facial images, LWIR detects signs of inflammation, cognitive stress, and pain, which make it valuable as an alternative to the visible spectrum. While well-studied in the field of thermal physiology, thermal images have yet to be applied towards broad medical screening in telemedicine since computers and smartphones are not equipped with LWIR sensors. As a result, this single technical constraint has prevented wider adoption of cross spectral imagery for medical assessment. To address this limitation, we develop generative algorithms to act as an ``emulator", or AI proxy, for a thermal camera by translating a visible image into its thermal pair. In this thesis, we develop three Visible-to-Thermal (V2T) facial translation algorithms based on Generative Adversarial Networks (GAN) that given a visible image, generates or translates, it into its thermal pair. In particular, the Visible-to-Thermal Facial GAN (VTF-GAN) operates in No-, Low-, and Hard-Light visible settings by learning a Fourier Transform Loss. We also offer the first V2T Facial Diffusion Model (VTF-Diff) that offers promising results, competitive to the VTF-GAN. However, the generation of a thermal face is meaningless if it misconstrues the individual's facial identity. This occurs when VT pairs are misaligned, which is a common occurrence during data collection when practitioners capture images using two different cameras (e.g visible and thermal cameras). As a result, we develop an unsupervised VT image registration algorithm called Vista Morph that incorporates generative flows to learn a deformation field between cross spectral pairs. Our work beats the state-of-the-art and offers the first VT facial application of image registration. We demonstrate through biometric thermal vessel extraction, that V2T translation using Vista Morph retains subject identity better than without. Further, Vista Morph works on automated driving street scene data and is robust to geometric warps and erasure. The generative works of VTF-GAN and Vista Morph culminate in its application on a real-life medical dataset called Intelligent Sight & Sound (ISS), a clinical trial of cancer patient pain. In collaboration with the U.S. National Institutes of Health (NIH), we trained our models on 29,500 VT cancer facial datasets, demonstrating that our approaches succeed under spontaneous settings, challenging head poses, poor resolution, and weak lighting conditions. To augment this work, we also conducted a deep dive into the NIH ISS dataset introducing it as the first of its kind. We proved its utility by developing several multimodal pain detection models to predict chronic cancer pain, a far more challenging scenario than conventional acute pain detection that exists today.

Multimodal Deep Generative Models for Cross Spectral Image Analysis

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract