Deep Comprehension of Visual Stories through Summarization and Question Answering

dc.contributor.advisorFerraro, Francis
dc.contributor.authorSapkale, Aishwarya Sudhakar
dc.contributor.authorSapkale, Aishwarya Sudhakar
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:25Z
dc.date.available2021-09-01T13:55:25Z
dc.date.issued2020-01-20
dc.description.abstractReasoning that requires joint modeling of images and text is gaining importance with its applicability in multiple research areas, such as image captioning, visual concepts and question answering. In this theses, we propose reasoning across a sequence of coherent images. This is conceptually different from reasoning over just a single image, as it involves the requirement of assessing and linking information from different images. This linking is more than general descriptions or captions of the images but complex narratives describing the situations more like scenes in a story. Therefore, we propose a novel task of Visual Comprehension which reasons across multiple related images by narratives written to broadly describe what is occurring in those images. We focus on different reasoning aspects starting from identifying the core concepts of the image sequences and stories in the form of concise summaries, to gaining detailed information about different facets of the image sequences and stories through complex question answers. We develop a new dataset for this purpose by crowdsourcing one-line summaries and question answers based on sequences of 5 images and their corresponding visual stories. Summaries are evaluated based on neural machine translation resulting in generations mostly driven by stories compared to images, whereas question answers are evaluated based on K-class classification resulting in predictions more driven by images, but stories do not hurt. Nonetheless, visual stories prove to be helpful for reasoning across multiple images. Thus, we propose a new task involving reasoning across a sequence of images and a short accompanying story through summarization and question answering.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2txt7-q0zk
dc.identifier.other12287
dc.identifier.urihttp://hdl.handle.net/11603/22836
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Sapkale_umbc_0434M_12287.pdf
dc.subjectJoint Modeling of images and text
dc.subjectVisual Comprehension
dc.subjectVisual Question Answering
dc.subjectVisual Storytelling
dc.subjectVisual Summarization
dc.titleDeep Comprehension of Visual Stories through Summarization and Question Answering
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sapkale_umbc_0434M_12287.pdf
Size:
3.29 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sapkale-Aishwarya_Open.pdf
Size:
332.53 KB
Format:
Adobe Portable Document Format
Description: