Deep Comprehension of Visual Stories through Summarization and Question Answering

Sapkale, Aishwarya Sudhakar

Deep Comprehension of Visual Stories through Summarization and Question Answering

dc.contributor.advisor	Ferraro, Francis
dc.contributor.author	Sapkale, Aishwarya Sudhakar
dc.contributor.author	Sapkale, Aishwarya Sudhakar
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Computer Science
dc.date.accessioned	2021-09-01T13:55:25Z
dc.date.available	2021-09-01T13:55:25Z
dc.date.issued	2020-01-20
dc.description.abstract	Reasoning that requires joint modeling of images and text is gaining importance with its applicability in multiple research areas, such as image captioning, visual concepts and question answering. In this theses, we propose reasoning across a sequence of coherent images. This is conceptually different from reasoning over just a single image, as it involves the requirement of assessing and linking information from different images. This linking is more than general descriptions or captions of the images but complex narratives describing the situations more like scenes in a story. Therefore, we propose a novel task of Visual Comprehension which reasons across multiple related images by narratives written to broadly describe what is occurring in those images. We focus on different reasoning aspects starting from identifying the core concepts of the image sequences and stories in the form of concise summaries, to gaining detailed information about different facets of the image sequences and stories through complex question answers. We develop a new dataset for this purpose by crowdsourcing one-line summaries and question answers based on sequences of 5 images and their corresponding visual stories. Summaries are evaluated based on neural machine translation resulting in generations mostly driven by stories compared to images, whereas question answers are evaluated based on K-class classification resulting in predictions more driven by images, but stories do not hurt. Nonetheless, visual stories prove to be helpful for reasoning across multiple images. Thus, we propose a new task involving reasoning across a sequence of images and a short accompanying story through summarization and question answering.
dc.format	application:pdf
dc.genre	theses
dc.identifier	doi:10.13016/m2txt7-q0zk
dc.identifier.other	12287
dc.identifier.uri	http://hdl.handle.net/11603/22836
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Sapkale_umbc_0434M_12287.pdf
dc.subject	Joint Modeling of images and text
dc.subject	Visual Comprehension
dc.subject	Visual Question Answering
dc.subject	Visual Storytelling
dc.subject	Visual Summarization
dc.title	Deep Comprehension of Visual Stories through Summarization and Question Answering
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sapkale_umbc_0434M_12287.pdf
Size:: 3.29 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Sapkale-Aishwarya_Open.pdf
Size:: 332.53 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Computer Science and Electrical Engineering Department
UMBC Graduate School
UMBC Student Collection