Alam, Mohammad MahmudulRaff, EdwardOates, TimHolt, James2022-10-102022-10-102022-08-15http://hdl.handle.net/11603/26128Self-Attention has become fundamentally a new approach to set and sequence modeling, particularly within transformerstyle architectures. Given a sequence of 𝑇 items the standard self-attention has O (𝑇 2 ) memory and compute needs, leading to many recent works building approximations to self-attention with reduced computational or memory complexity. In this work, we instead re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same logical strategy of the standard self-attention. Implemented as a “Hrrformer” we obtain several benefits including faster compute (O (𝑇 log𝑇 ) time complexity), less memory-use per layer (O (𝑇 ) space complexity), convergence in 10× fewer epochs, near state-of-the-art accuracy, and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer up to 370× faster to train on the Long Range Arena benchmark.9 pagesen-USThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.Recasting Self-Attention with Holographic Reduced RepresentationsText