Towards Efficient Deep Learning Models for Facial Expression Recognition using Transformers

Date

2023-12-01

Department

Program

Citation of Original Publication

Safavi, Farshad, Kulin Patel, and Ramana Kumar Vinjamuri. “Towards Efficient Deep Learning Models for Facial Expression Recognition Using Transformers.” In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN), 1–4, 2023. https://doi.org/10.1109/BSN58485.2023.10331041.

Rights

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Subjects

Abstract

Facial expression recognition (FER) is crucial in various healthcare applications, including pain assessment, mental disorder diagnosis, and assistive robots that require close interaction with humans. While heavyweight deep learning models can achieve high accuracy for FER, their computational cost and memory consumption often need optimization for portable and mobile devices. Therefore, efficient deep learning models with high accuracy are essential to enable FER on resource-constrained platforms. This paper presents a new efficient deep-learning model for facial expression recognition. The model utilizes Mix Transformer (MiT) blocks, adopted from the SegFormer architecture, along with a supplemented fusion block. The efficient self-attention mechanism in the transformer focuses on relevant information for classifying different facial expressions while significantly improving efficiency. Furthermore, our supplemented fusion block integrates multiscale feature maps to capture both fine-grained and coarse features. Experimental results demonstrate that the proposed model significantly reduces the computational cost, latency, and the number of learnable parameters while achieving high accuracy compared with the previous state-of-the-art (SOTA) on the FER2013 dataset.