Open Information Extraction for Code-Mix Hindi-English Social Media Data
dc.contributor.advisor | Ferraro, Francis | |
dc.contributor.author | PATE, MAYUR SATISH | |
dc.contributor.department | Computer Science and Electrical Engineering | |
dc.contributor.program | Computer Science | |
dc.date.accessioned | 2021-01-29T18:12:30Z | |
dc.date.available | 2021-01-29T18:12:30Z | |
dc.date.issued | 2018-01-01 | |
dc.description.abstract | Open domain relation extraction (Angeli, Premkumar, & Manning 2015) is a process of finding relation triples. While there are a number of available systems for open information extraction (Open IE) for a single language, traditional Open IE systems are not well suited to content that contains multiple languages in a single utterance. In this theses, we have extended an existing code mix corpus (Das, Jamatia, & Gamb�ack 2015) by finding and annotating relation triples in an Open IE fashion. We will be open sourcing this newly annotated dataset. Using this newly annotated corpus, we have experimented with sequence-to-sequence neural networks (Zhang, Duh, & Van Durme 2017) for finding the relationship triples. As a prerequisite for relationship extraction pipeline, we have developed a part-of-speech tagger, named entity recognizer and predicate recognizer for code-mix content. We have experimented with various approaches such as Conditional Random Fields (CRF), Average Perceptron and deep neural networks. According to our knowledge, this relationship extraction system is the first ever contribution for any code mix natural language. We have achieved promising results for all of the components and it could be improved in the future with more code mix data. | |
dc.format | application:pdf | |
dc.genre | theses | |
dc.identifier | doi:10.13016/m2mhwn-7zax | |
dc.identifier.other | 11894 | |
dc.identifier.uri | http://hdl.handle.net/11603/20714 | |
dc.language | en | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
dc.relation.ispartof | UMBC Theses and Dissertations Collection | |
dc.relation.ispartof | UMBC Graduate School Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.source | Original File Name: PATE_umbc_0434M_11894.pdf | |
dc.subject | Code Mixing | |
dc.subject | Machine Learning | |
dc.subject | Named Entity Recongnition | |
dc.subject | Natural Language Processing | |
dc.subject | Open IE | |
dc.subject | Seq2Seq Neural Network | |
dc.title | Open Information Extraction for Code-Mix Hindi-English Social Media Data | |
dc.type | Text | |
dcterms.accessRights | Distribution Rights granted to UMBC by the author. | |
dcterms.accessRights | Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission. | |
dcterms.accessRights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. |