On Finer Control of Information Flow in LSTMs
Loading...
Files
Author/Creator
Author/Creator ORCID
Date
2019-01-18
Type of Work
Department
Program
Citation of Original Publication
Gao H., Oates T. (2019) On Finer Control of Information Flow in LSTMs. In: Berlingerio M., Bonchi F., Gärtner T., Hurley N., Ifrim G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science, vol 11051. Springer, Cham. https://doi.org/10.1007/978-3-030-10925-7_32
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
© Springer Nature Switzerland AG 2019
© Springer Nature Switzerland AG 2019
Abstract
Since its inception in 1995, the Long Short-Term Memory (LSTM) architecture for recurrent neural networks has shown promising performance, sometimes state-of-art, for various tasks. Aiming at achieving constant error flow through hidden units, LSTM introduces a complex unit called a memory cell, in which gates are adopted to control the exposure/isolation of information flowing in, out and back to itself. Despite its widely acknowledged success, in this paper, we propose a hypothesis that LSTMs may suffer from an implicit functional binding of information exposure/isolation for the output and candidate computation, i.e., the output gate at time t−1 is not only in charge of the information flowing out of a cell as the response to the external environment, but also controls the information flowing back to the cell for the candidate computation, which is often the only source of nonlinear combination of input at time t and previous cell state at time t−1 for cell memory updates. We propose Untied Long Short Term Memory (ULSTM) as a solution to the above problem. We test our model on various tasks, including semantic relatedness prediction, language modeling and sentiment classification. Experimental results indicate that our proposed model is capable to at least partially solve the problem and outperform LSTM for all these tasks.