GitHub - lucidrains/CoCa-pytorch: Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Abstract

1. Introduction

2. Related Work

3. Approach

3.1 Natural Language Supervision

Single Encoder Classification

https://blog.kakaocdn.net/dn/sZNfu/btrD3x6afBd/vWbRwOKAOZ4OFFjUMM3mH1/img.png

이러한 annotation은 discrete class vector로 매핑 되어있기 때문에 cross entropy로 Loss를 구하는 것이 효과적

→ generic visual representation extractor

Dual-Encoder Contrastive Learning