Recasting Self-Attention with Holographic Reduced Representations

Author/Creator ORCID

Date

2022-08-15

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Abstract

Self-Attention has become fundamentally a new approach to set and sequence modeling, particularly within transformerstyle architectures. Given a sequence of 𝑇 items the standard self-attention has O (𝑇 2 ) memory and compute needs, leading to many recent works building approximations to self-attention with reduced computational or memory complexity. In this work, we instead re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same logical strategy of the standard self-attention. Implemented as a “Hrrformer” we obtain several benefits including faster compute (O (𝑇 log𝑇 ) time complexity), less memory-use per layer (O (𝑇 ) space complexity), convergence in 10× fewer epochs, near state-of-the-art accuracy, and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer up to 370× faster to train on the Long Range Arena benchmark.