Video Summarization using Unsupervised Methods
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2018-01-01
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
Due to the increasing volume of the video data uploaded daily on the web through prime sources including social media, Youtube, and video sharing websites, video summarization has emerged as an important and challenging problem in the industry. Video summarization and its applications in various domains like consumer industry and marketing, generating a trailer for movies, highlights for different sports events. As a result, an efficient mechanism for extracting important video contents is the need to deal with a large amount of videographic repositories. We present a novel unsupervised approach to generate video summaries using simpler networks like VGG and ResNet instead of using complex networks i.e. LSTM and RNN. Video summarization and Image captioning are two completely different and independent tasks, yet we propose an approach that considers generating summaries using a feature space produced as a result of the image captioning of a video. Our main idea is generating short and informative summaries in a completely unsupervised manner using basic and traditional clustering technique modeled jointly with the video captioning framework NeuralTalk2. We conducted experiments in different settings with SumMe and TVSum datasets. Our approach achieved state-of-the-art results for SumMe dataset with an F-score of 35.6.