Recasting Self-Attention with Holographic Reduced Representations
Loading...
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2022-08-15
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
Self-Attention has become fundamentally a new approach to
set and sequence modeling, particularly within transformerstyle architectures. Given a sequence of 𝑇 items the standard self-attention has O (𝑇
2
) memory and compute needs,
leading to many recent works building approximations to
self-attention with reduced computational or memory complexity. In this work, we instead re-cast self-attention using the neuro-symbolic approach of Holographic Reduced
Representations (HRR). In doing so we perform the same
logical strategy of the standard self-attention. Implemented
as a “Hrrformer” we obtain several benefits including faster
compute (O (𝑇 log𝑇 ) time complexity), less memory-use per
layer (O (𝑇 ) space complexity), convergence in 10× fewer
epochs, near state-of-the-art accuracy, and we are able to
learn with just a single layer. Combined, these benefits make
our Hrrformer up to 370× faster to train on the Long Range
Arena benchmark.