Browsing by Author "Sanjabi, Maziar"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Can we train vision and language zero-shot classification models without syntax?(2022-11-01) Tejankar, Ajinkya; Sanjabi, Maziar; Wu, Bichen; Khabsa, Madian; Xie, Saining; Pirsiavash, Hamed; Firooz, HamedNatural language supervision in the form of image captions was recently shown to be an effective way of training zero-shot image classification models. In this work, we focus on teasing out what parts of the language supervision are essential for training zero-shot models. Through extensive and careful experiments, we show that replacing intact captions with Bag-of-Words (BoW) does not significantly degrade the zero-shot performance. Surprisingly, we can even slightly improve the performance on some datasets by balancing the frequency of words in BoW.Item Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning(IEEE, 2023) Tejankar, Ajinkya; Sanjabi, Maziar; Wang, Qifan; Wang, Sinong; Firooz, Hamed; Pirsiavash, Hamed; Tan, LiangRecently, self-supervised learning (SSL) was shown to be vulnerable to patch-based data poisoning backdoor attacks. It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a back-door that the adversary can exploit. This work aims to defend self-supervised learning against such attacks. We use a three-step defense pipeline, where we first train a model on the poisoned data. In the second step, our proposed defense algorithm (PatchSearch) uses the trained model to search the training data for poisoned samples and removes them from the training set. In the third step, a final model is trained on the cleaned-up training set. Our results show that PatchSearch is an effective defense. As an example, it improves a model's accuracy on images containing the trigger from 38.2% to 63.7% which is very close to the clean model's accuracy, 64.6%. More-over, we show that PatchSearch outperforms baselines and state-of-the-art defense approaches including those using additional clean, trusted data. Our code is available at https://github.com/UCDvision/PatchSearchItem A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision(2022-01-06) Tejankar, Ajinkya; Sanjabi, Maziar; Wu, Bichen; Xie, Saining; Khabsa, Madian; Pirsiavash, Hamed; Firooz, HamedUsing natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large training datasets, then the resulting aligned models perform well on zero-shot classification as downstream tasks2 . In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models. Through extensive and careful experiments, we show that: 1) A simple Bag-of-Words (BoW) caption could be used as a replacement for most of the image captions in the dataset. Surprisingly, we observe that this approach improves the zero-shot classification performance when combined with word balancing. 2) Using a BoW pretrained model, we can obtain more training data by generating pseudo-BoW captions on images that do not have a caption. Models trained on images with real and pseudo-BoW captions achieve stronger zero-shot performance. On ImageNet-1k zero-shot evaluation, our best model, that uses only 3M image-caption pairs, performs on-par with a CLIP model trained on 15M image-caption pairs (31.5% vs 31.3%)