Contrastive learning algorithms usually assume other samples from the same batch as the negative samples, which naturally introduces labelling noise (those samples are not necessarily negative).
The framework exploits K-mean to cluster the learned representation, which produces new pseudo labels. The self-supervised model then learns from these pseudo labels to refine its output representation.
It seems that the self-supervised model utilises the pseudo labels by imposing a cross-entropy loss on top of the contrastive loss. What'd be the issue of re-defining the positive and negative samples based on the pseudo labels, and keep training the model with the contrastive loss alone?