Summary

The problems

Contrastive learning algorithms usually assume other samples from the same batch as the negative samples, which naturally introduces labelling noise (those samples are not necessarily negative).

The solution

The framework exploits K-mean to cluster the learned representation, which produces new pseudo labels. The self-supervised model then learns from these pseudo labels to refine its output representation.

Thoughts

It seems that the self-supervised model utilises the pseudo labels by imposing a cross-entropy loss on top of the contrastive loss. What'd be the issue of re-defining the positive and negative samples based on the pseudo labels, and keep training the model with the contrastive loss alone?