Open Information Extraction for Code-Mix Hindi-English Social Media Data

dc.contributor.advisorFerraro, Francis
dc.contributor.authorPATE, MAYUR SATISH
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-01-29T18:12:30Z
dc.date.available2021-01-29T18:12:30Z
dc.date.issued2018-01-01
dc.description.abstractOpen domain relation extraction (Angeli, Premkumar, & Manning 2015) is a process of finding relation triples. While there are a number of available systems for open information extraction (Open IE) for a single language, traditional Open IE systems are not well suited to content that contains multiple languages in a single utterance. In this theses, we have extended an existing code mix corpus (Das, Jamatia, & Gamb�ack 2015) by finding and annotating relation triples in an Open IE fashion. We will be open sourcing this newly annotated dataset. Using this newly annotated corpus, we have experimented with sequence-to-sequence neural networks (Zhang, Duh, & Van Durme 2017) for finding the relationship triples. As a prerequisite for relationship extraction pipeline, we have developed a part-of-speech tagger, named entity recognizer and predicate recognizer for code-mix content. We have experimented with various approaches such as Conditional Random Fields (CRF), Average Perceptron and deep neural networks. According to our knowledge, this relationship extraction system is the first ever contribution for any code mix natural language. We have achieved promising results for all of the components and it could be improved in the future with more code mix data.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2mhwn-7zax
dc.identifier.other11894
dc.identifier.urihttp://hdl.handle.net/11603/20714
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: PATE_umbc_0434M_11894.pdf
dc.subjectCode Mixing
dc.subjectMachine Learning
dc.subjectNamed Entity Recongnition
dc.subjectNatural Language Processing
dc.subjectOpen IE
dc.subjectSeq2Seq Neural Network
dc.titleOpen Information Extraction for Code-Mix Hindi-English Social Media Data
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PATE_umbc_0434M_11894.pdf
Size:
1.81 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
PateMOpenInformation_Open.pdf
Size:
44.02 KB
Format:
Adobe Portable Document Format
Description: