Representing Spatial Relations using Convolutional Neural Networks

Author/Creator ORCID

Date

2016-01-01

Type of Work

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

Recent interest in deep neural networks in the machine learning community is mainly due to their ability to produce state-of- the-art results in various domains, without the need to manually engineer features. Deep convolutional networks (conv-nets) which are well suited for signal processing tasks, have been very successful in performing a number of computer vision tasks such as image classification, object detection, and scene recognition. However, despite their excellent performance, limited work has been done in understanding what they learn, how they represent it, and what regions of images are important for classification accuracy. This lack of understanding has motivated the need to eliminate the black-box and shed light on the inner workings of deep neural network. Existing work shows the presence of various mid-level concepts and image attributes inside conv-nets. In this theses, we are interested in finding how conv-nets learn spatial relations between objects in an image and how they represent it. We propose a method to analyze different regions of images and and identify regions which are most important for classification of spatial relations. The key idea is to mask out different regions of an image and analyze change in a classification metric for corresponding spatial relations. We also analyze different regions of the network and find sets of nodes that are important for classification of spatial relations. We use a simple dataset that contains images annotated with spatial relations and evaluate our method on this dataset.