Supervised Training Strategies for Low Resource Language Processing

dc.contributor.advisorOates, Tim
dc.contributor.advisorFerraro, Francis
dc.contributor.authorGanesan, Ashwinkumar
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2022-02-09T15:52:32Z
dc.date.available2022-02-09T15:52:32Z
dc.date.issued2020-01-01
dc.description.abstractOver the last decade, we have witnessed an explosion in Artificial Intelligence (AI) research with a focus on deep neural networks (DNN). Since Krizhevsky et al. (2017) proposed a convolutional neural network architecture (CNN) for the ImageNet (Deng et al., 2009) task, deep neural networks have become a default model of choice for many computer vision and natural language processing tasks. The architecture is able to showcase an important property, i.e., a modular composite function model (with different layers / operations) can be easily scaled up for a large dataset. This has led to the creation of a generation of deep neural models built on easy access to image and textual data. But this approach to the construction of a neural networks has two important limitations. First, as the number of parameters in the model increases exponentially, the GPU computing necessary to train and perform inference prohibitively increases too. Also, as the universe of natural language processing tasks expands, the cognitive complexity of the task increases too. The cost of collecting good quality annotations (for textual data) becomes a barrier to building better models in the future. A common solution to solve this problem is to very large train unsupervised models with a huge textual corpora available on the web and then transfer them to other tasks. In our work, we study each of these challenges and propose different approaches to alleviate them by focusing on design models that are data and hardware efficient. Our work has three main contributions. Firstly, we study methods to efficiently utilize existing datasets by exploiting the inherent relationship between samples in the dataset. We propose a locality preserving alignment algorithm that learns the local manifold structure surrounding a datapoint in embedding space and then aligns two manifolds preserving this structure. Thus points that do not have a target label but are present in the neighborhood of a given datapoint in the supervised set, can be mapped in the target domain too. This augments a given dataset with pseudo text-label pairs that can be used for additional model training. Secondly, most current generation models deployed for NLP tasks (apart from autoencoders) are designed to have unidirectional flow from source to target. We propose a bidirectional manifold alignment (BDMA) method that trains a single model to be perform forward and reverse mapping. The model is optimized with a cycle consistency loss inspired by Zhu et al. (2017)'s research on CycleGANs. We show the effectiveness of this approach on the crosslingual word alignment (CLWA) task and how it can improve hardware efficiency and reduce the number of models deployed. Lastly, in order reduce the size of the model, we propose a model architecture that infers labels with holographic reduce representations (HRR). HRRs provide the ability to compose and decompose embeddings. In an eXtreme Multi-Label (XML) setting where there are a very large set of labels, we show how a model's output layer is compressed when the layer is replaced with a multi-label embedding that can be decomposed into its primary constituents. We show that the new HRR-based model has precision equivalent to the standard model.
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m2kyj8-nfel
dc.identifier.other12335
dc.identifier.urihttp://hdl.handle.net/11603/24174
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Ganesan_umbc_0434D_12335.pdf
dc.subjectArtificial Intelligence
dc.subjectDeep Learning
dc.subjectManifold Alignment
dc.subjectManifold Learning
dc.subjectNatural Language Processing
dc.titleSupervised Training Strategies for Low Resource Language Processing
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ganesan_umbc_0434D_12335.pdf
Size:
2.53 MB
Format:
Adobe Portable Document Format