SimA: Simple Softmax-free Attention for Vision Transformers
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2022-06-17
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
Recently, vision transformers have become very popular. However, deploying
them in many applications is computationally expensive partly due to the Softmax
layer in the attention block. We introduce a simple but effective, Softmax-free
attention block, SimA, which normalizes query and key matrices with simple
`1-norm instead of using Softmax layer. Then, the attention block in SimA is
a simple multiplication of three matrices, so SimA can dynamically change the
ordering of the computation at the test time to achieve linear computation on the
number of tokens or the number of channels. We empirically show that SimA
applied to three SOTA variations of transformers, DeiT, XCiT, and CvT, results
in on-par accuracy compared to the SOTA models, without any need for Softmax
layer. Interestingly, changing SimA from multi-head to single-head has only a
small effect on the accuracy, which simplifies the attention block further. The code
is available here: https://github.com/UCDvision/sima