Open Information Extraction for Code-Mix Hindi-English Social Media Data
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2018-01-01
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
Open domain relation extraction (Angeli, Premkumar, & Manning 2015) is a process of finding relation triples. While there are a number of available systems for open information extraction (Open IE) for a single language, traditional Open IE systems are not well suited to content that contains multiple languages in a single utterance. In this theses, we have extended an existing code mix corpus (Das, Jamatia, & Gamb�ack 2015) by finding and annotating relation triples in an Open IE fashion. We will be open sourcing this newly annotated dataset. Using this newly annotated corpus, we have experimented with sequence-to-sequence neural networks (Zhang, Duh, & Van Durme 2017) for finding the relationship triples. As a prerequisite for relationship extraction pipeline, we have developed a part-of-speech tagger, named entity recognizer and predicate recognizer for code-mix content. We have experimented with various approaches such as Conditional Random Fields (CRF), Average Perceptron and deep neural networks. According to our knowledge, this relationship extraction system is the first ever contribution for any code mix natural language. We have achieved promising results for all of the components and it could be improved in the future with more code mix data.