Anticipating Visual Representations from Unlabeled Video

Vondrick, Carl; Pirsiavash, Hamed; Torralba, Antonio

Anticipating Visual Representations from Unlabeled Video

dc.contributor.author	Vondrick, Carl
dc.contributor.author	Pirsiavash, Hamed
dc.contributor.author	Torralba, Antonio
dc.date.accessioned	2019-06-28T16:22:58Z
dc.date.available	2019-06-28T16:22:58Z
dc.date.issued	2016-11-30
dc.description.abstract	Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We present a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. Visual representations are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute. We then apply recognition algorithms on our predicted representation to anticipate objects and actions. We experimentally validate this idea on two datasets, anticipating actions one second in the future and objects five seconds in the future.	en_US
dc.description.sponsorship	This work was supported by NSF grant IIS-1524817, and by a Google faculty research award to AT, and a Google PhD fellowship to CV.	en_US
dc.description.uri	https://ieeexplore.ieee.org/document/7780387	en_US
dc.format.extent	9 pages	en_US
dc.genre	conference papers and proceedings preprints	en_US
dc.identifier	doi:10.13016/m26ion-fjjh
dc.identifier.citation	Carl Vondrick, et.al , Anticipating Visual Representations from Unlabeled Video, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI: 10.1109/CVPR.2016.18	en_US
dc.identifier.uri	https://doi.org/10.1109/CVPR.2016.18
dc.identifier.uri	http://hdl.handle.net/11603/14316
dc.language.iso	en_US	en_US
dc.publisher	IEEE	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights	© 2016 IEEE
dc.subject	Visualization	en_US
dc.subject	Prediction algorithms	en_US
dc.subject	Computer vision	en_US
dc.subject	Predictive models	en_US
dc.subject	Semantics	en_US
dc.subject	Biological system modeling	en_US
dc.subject	Network architecture	en_US
dc.subject	video signal processing	en_US
dc.subject	deep networks	en_US
dc.subject	semantic level	en_US
dc.title	Anticipating Visual Representations from Unlabeled Video	en_US
dc.type	Text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1504.08023.pdf
Size:: 3.63 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection