Attention correction mechanism of visual contexts in visual Question answering

dc.contributor.advisorOates, Tim
dc.contributor.authorSharan, Komal
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-01-29T18:12:46Z
dc.date.available2021-01-29T18:12:46Z
dc.date.issued2018-01-01
dc.description.abstractTo answer a question about an image or to merely describe an object in an image for answering, the current visual question answering systems have been augmented with attention mechanisms. The visual question answering mechanisms before the advent of attention mechanisms worked on the principle of training over a combination of image feature vectors and question and answer embeddings. Attention mechanisms like stacked attention networks and hierarchical co-attention attention mechanisms, help to figure out which parts of the image to attend but hardly emphasize on correcting attention. We propose a mechanism for correcting visual attention by using the concept of saliency of parts of the image being attended to. We primarily use a study of how the gaze of humans shifts over an image can help us improving the attention generated by introducing an auxiliary loss in a standard stacked attention network pipeline. For this mechanism, we use a dataset known as the VQA HAT dataset which is a large-scale collection of images containing regions explored by humans, and we use this dataset for further augmenting the work.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2tkaf-kt7o
dc.identifier.other11938
dc.identifier.urihttp://hdl.handle.net/11603/20751
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Sharan_umbc_0434M_11938.pdf
dc.subjectHuman Attention Map
dc.subjectStacked Attention Networks
dc.subjectVQA-HAT
dc.titleAttention correction mechanism of visual contexts in visual Question answering
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sharan_umbc_0434M_11938.pdf
Size:
10.71 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SharanKAttention_Open.pdf
Size:
42.14 KB
Format:
Adobe Portable Document Format
Description: