AFFINITY PROPAGATION INITIALISATION BASED PROXIMITY CLUSTERING FOR LABELING

Author/Creator

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

The modern state of the art relation extraction systems requires large amounts of labeled data. However, obtaining such vast amounts of labeled data is a costly task and is almost impossible, especially in industrial settings, due to the time constraints of subject matter experts. Techniques like distant supervision have been used to provide noisy annotations, but this requires the availability of a related knowledge base, which is rarely possible. We propose a novel method where we obtain labeled data based on techniques inspired by Active Learning and Clustering. Our approach differs from Active Learning as we operate under weak supervision, where all the instances provided for training are not manually labeled. We adopt a new clustering paradigm where we use Affinity Propagation to identify potential cluster centers and adopt a randomized local optimization to reduce the number of clusters while increasing the similarity among instances in a cluster. This unique combination of randomization and localization in Clustering paves the way for a distinct class of clustering algorithms.