MedFuseNet: Attention-based Multimodal deep learning model for Visual Question Answering in the Medical Domain

Sharma, Dhruv; Purushotham, Sanjay; Reddy, Chandan K.

MedFuseNet: Attention-based Multimodal deep learning model for Visual Question Answering in the Medical Domain

dc.contributor.author	Sharma, Dhruv
dc.contributor.author	Purushotham, Sanjay
dc.contributor.author	Reddy, Chandan K.
dc.date.accessioned	2021-09-15T18:02:28Z
dc.date.available	2021-09-15T18:02:28Z
dc.date.issued	2021
dc.description.abstract	Medical images are difficult to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of physical and mental fatigue due to the high number of cases, inducing human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system to provide a ‘second opinion’ on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this paper, we develop MedFuseNet, an attention-based multimodal deep learning model, for VQA on medical images taking the associated challenges into account. Our MedFuseNet aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and predicting the answer. We tackle two types of answer prediction - categorization and generation. We conducted an extensive set of quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our experiments demonstrates that MedFuseNet outperforms the state-of-the-art VQA methods, and that visualization of the captured attentions showcases the intepretability of our model’s predicted results.	en_US
dc.description.uri	https://people.cs.vt.edu/~reddy/papers/NSR21.pdf	en_US
dc.format.extent	20 pages	en_US
dc.genre	journal articles	en_US
dc.genre	preprints	en_US
dc.identifier	doi:10.13016/m2blqx-5unw
dc.identifier.citation	Sharma, Dhruv; Purushotham, Sanjay; Reddy, Chandan K.; MedFuseNet: Attention-based Multimodal deep learning model for Visual Question Answering in the Medical Domain; Virginia Polytechnic Institute and State University, 2021; https://people.cs.vt.edu/~reddy/papers/NSR21.pdf	en_US
dc.identifier.uri	http://hdl.handle.net/11603/22992
dc.language.iso	en_US	en_US
dc.publisher	Virginia Polytechnic Institute and State University	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.	en_US
dc.title	MedFuseNet: Attention-based Multimodal deep learning model for Visual Question Answering in the Medical Domain	en_US
dc.type	Text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: NSR21.pdf
Size:: 6.66 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Information Systems Department
UMBC Faculty Collection