Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems
Links to Fileshttps://ieeexplore.ieee.org/document/9123041
MetadataShow full item record
Type of Work7 pages
conference papers and proceedings preprints
Citation of Original PublicationA. Bandi, K. Joshi and V. Mulwad, "Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems," 2020 IEEE 6th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Baltimore, MD, USA, 2020, pp. 1-7, doi: 10.1109/BigDataSecurity-HPSC-IDS49724.2020.00012.
RightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
© 2020 IEEE
SubjectsUMBC Ebiquity Research Group
A key challenge for natural language based large text data is automatically extracting knowledge, in terms of entities and relations, embedded in it. State of the art relation extraction systems requires large amounts of labeled data, which is costly and very difficult, especially in industrial settings, due to time constraints of subject matter experts. Techniques like distant supervision require the availability of a related knowledge base, which is rarely possible. We have developed a novel model for automatically clustering textual Big Data, based on techniques inspired from Active Learning and Clustering, that can derive powerful insights and make the data ready for machine learning with minimal manual effort. Our approach differs from Active Learning as we operate under weak supervision, where all the instances provided for training are not manually labeled. Secondly, This differs from any prevailing clustering algorithms as we adopt a whole new approach of proximity clustering based on affinity propagation. Due to the extrapolation of the labeling efforts, our model makes it easier to adopt deep learning approaches with minimal manual effort. In this paper, we describe our algorithm in detail, along with the experimental results obtained for them.