Supervised Training Strategies for Low Resource Language Processing

Ganesan, Ashwinkumar

Supervised Training Strategies for Low Resource Language Processing

dc.contributor.advisor	Oates, Tim
dc.contributor.advisor	Ferraro, Francis
dc.contributor.author	Ganesan, Ashwinkumar
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Computer Science
dc.date.accessioned	2022-02-09T15:52:32Z
dc.date.available	2022-02-09T15:52:32Z
dc.date.issued	2020-01-01
dc.description.abstract	Over the last decade, we have witnessed an explosion in Artificial Intelligence (AI) research with a focus on deep neural networks (DNN). Since Krizhevsky et al. (2017) proposed a convolutional neural network architecture (CNN) for the ImageNet (Deng et al., 2009) task, deep neural networks have become a default model of choice for many computer vision and natural language processing tasks. The architecture is able to showcase an important property, i.e., a modular composite function model (with different layers / operations) can be easily scaled up for a large dataset. This has led to the creation of a generation of deep neural models built on easy access to image and textual data. But this approach to the construction of a neural networks has two important limitations. First, as the number of parameters in the model increases exponentially, the GPU computing necessary to train and perform inference prohibitively increases too. Also, as the universe of natural language processing tasks expands, the cognitive complexity of the task increases too. The cost of collecting good quality annotations (for textual data) becomes a barrier to building better models in the future. A common solution to solve this problem is to very large train unsupervised models with a huge textual corpora available on the web and then transfer them to other tasks. In our work, we study each of these challenges and propose different approaches to alleviate them by focusing on design models that are data and hardware efficient. Our work has three main contributions. Firstly, we study methods to efficiently utilize existing datasets by exploiting the inherent relationship between samples in the dataset. We propose a locality preserving alignment algorithm that learns the local manifold structure surrounding a datapoint in embedding space and then aligns two manifolds preserving this structure. Thus points that do not have a target label but are present in the neighborhood of a given datapoint in the supervised set, can be mapped in the target domain too. This augments a given dataset with pseudo text-label pairs that can be used for additional model training. Secondly, most current generation models deployed for NLP tasks (apart from autoencoders) are designed to have unidirectional flow from source to target. We propose a bidirectional manifold alignment (BDMA) method that trains a single model to be perform forward and reverse mapping. The model is optimized with a cycle consistency loss inspired by Zhu et al. (2017)'s research on CycleGANs. We show the effectiveness of this approach on the crosslingual word alignment (CLWA) task and how it can improve hardware efficiency and reduce the number of models deployed. Lastly, in order reduce the size of the model, we propose a model architecture that infers labels with holographic reduce representations (HRR). HRRs provide the ability to compose and decompose embeddings. In an eXtreme Multi-Label (XML) setting where there are a very large set of labels, we show how a model's output layer is compressed when the layer is replaced with a multi-label embedding that can be decomposed into its primary constituents. We show that the new HRR-based model has precision equivalent to the standard model.
dc.format	application:pdf
dc.genre	dissertations
dc.identifier	doi:10.13016/m2kyj8-nfel
dc.identifier.other	12335
dc.identifier.uri	http://hdl.handle.net/11603/24174
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Ganesan_umbc_0434D_12335.pdf
dc.subject	Artificial Intelligence
dc.subject	Deep Learning
dc.subject	Manifold Alignment
dc.subject	Manifold Learning
dc.subject	Natural Language Processing
dc.title	Supervised Training Strategies for Low Resource Language Processing
dc.type	Text
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ganesan_umbc_0434D_12335.pdf
Size:: 2.53 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Theses and Dissertations
UMBC Computer Science and Electrical Engineering Department
UMBC Graduate School
UMBC Student Collection