CLIP revolves Image Classification paradigm.
Standard image classification takes non-sense one-hot labels as supervision.
Traditional Supervision: 5
One-hot Label: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
Supervision from CLIP:
"A cute dog wearing a mask
looks like he is worried about the virus."
[Blog]
CLIP (Contrastive Language-Image Pre-Training)
Training: ****Image-Text Pairs from the Internet.
Testing: Check similarity between image and proposed text's embeddings.
Therefore, it obtains impressive โzero-shotโ capabilities.
Paper Title: Learning Transferable Visual Models From Natural Language Supervision
Born in January 5, 2021.