    • A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision 

      Tejankar, Ajinkya; Sanjabi, Maziar; Wu, Bichen; Xie, Saining; Khabsa, Madian; Pirsiavash, Hamed; Firooz, Hamed (2022-01-06)
      Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large ...