SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation
Loading...
Links to Files
Permanent Link
Author/Creator ORCID
Date
2022-01-13
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution 4.0 International (CC BY 4.0)
Attribution 4.0 International (CC BY 4.0)
Subjects
Abstract
Feature regression is a simple way to distill large neural network models to smaller ones. We show that with simple changes to the network architecture, regression can outperform more complex state-of-the-art approaches for knowledge distillation from self-supervised models. Surprisingly, the addition of a multi-layer perceptron head to the CNN backbone is beneficial even if used only during distillation and discarded in the downstream task. Deeper non-linear projections can thus be used to accurately mimic the teacher without changing inference architecture and time. Moreover, we utilize independent projection heads to simultaneously distill multiple teacher networks. We also find that using the same weakly augmented image as input for both teacher and student networks aids distillation. Experiments on ImageNet dataset demonstrate the efficacy of the proposed changes in various self-supervised distillation settings.