Boosting Self-supervised Learning via Knowledge Transfer
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Computer Science and Electrical Engineering
Program
Computer Science
Citation of Original Publication
Rights
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
In Self-supervised learning (SSL), an auxiliary task is designed to solve a particular problem, also called pretraining on a specific dataset without the need for human annotation. This process is an initial phase of transfer learning, where one learns a model on an auxiliary task and transfer this model to solve another task by fine-tuning on target dataset. In transfer learning, the inherent constraint is to use same model architecture for both pretraining and fine-tuning. This approach gives rise to issues in designing and comparing various models and auxiliary tasks. An example is, one cannot use different model architecture on auxiliary task and target domain task due to the limitations of fine-tuning and training settings. Since model architectures with varying task complexities are being used by researchers, this makes it hard to compare different approaches. The motive of this work is to design a framework that overcomes the above-mentioned limitations. If there is a way to transfer knowledge from a pre-trained model to a target model, then we should be able to use various architectures in both the phases. Towards this goal, we designed a novel framework that separates the auxiliary training from target domain training by developing an effective transfer method based on clustering. We cluster the features computed from pretrained model to obtain pseudo-labels and learn a novel representation to predict the pseudo-labels. The intuition behind this approach is, in a good visual representational space, semantically similar data points must be closer together than dissimilar data points. This metric should be learnt inherently by the network during the pre-training in order generate good features. This approach gives us flexibility in assessing incompatible models like hand-crafted features. This separation enables us to use different model architectures during auxiliary training and target domain training and also experiment with deeper models to learn better representations. We are also able to boost the performance by increasing the complexity of the auxiliary task and then transfer the knowledge from a deeper model to a shallower one. We conducted various experiments on various datasets to evaluate the performance of this method. This framework outperformed all the current state-of-the-art SSL methods on benchmarks datasets. Our method achieved 72.5% mAP on classification task and 57.2% mAP on object detection task of PASCAL VOC dataset.