Abstract

  1. Not necessary
  2. Stop-gradient is important
  3. Proof-of-concept
  4. Competitive results

Method

Distance is defined as negative cosine similarity

$$ \mathcal{D}\left(p_{1}, z_{2}\right)=-\frac{p_{1}}{\left\|p_{1}\right\|{2}} \cdot \frac{z{2}}{\left\|z_{2}\right\|_{2}} $$

Loss is symmetric

$$ \mathcal{L}=\frac{1}{2} \mathcal{D}\left(p_{1}, z_{2}\right)+\frac{1}{2} \mathcal{D}\left(p_{2}, z_{1}\right) $$

Empirical Study

Stop-gradient

Predictor

b). not because of collapsing solution

Batch Size