Two Concept

Method

PixContrast

Spatial sensitivity

  1. sampling two augmentation views from the same image. The two views are both resized to a fixed resolution (e.g., 224 × 224)

  2. passing through a regular encoder network (backbone and projection head) and a momentum encoder network

  3. warp feature map to original image, compute normalized (compensate scaling in aug) all pair distances; d ≤ threshold ⇒ positive pair; d > threshold ⇒ negative pair

  4. contrastive loss (logarithmic softmax cosine similarity/dissimilarity)

    $$ \mathcal{L}{\text {Pix }}(i)=-\log \frac{\sum{j \in \Omega_{p}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{j}^{\prime}\right) / \tau}}{\sum_{j \in \Omega_{p}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{j}^{\prime}\right) / \tau}+\sum_{k \in \Omega_{n}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{k}^{\prime}\right) / \tau}} $$

    the loss is averaged over all pixels on the first view that lie in the intersection of the two views. the loss of pixels on second view is also averaged.

    The final loss is the average over all image pairs in a mini-batch.

PixPro

Spatial sensitivity & Spatial smoothness

  1. Same as PixContrast
  2. Same as PixContrast
  3. before warp feature map to original image, one branch go through PPM
  4. Pixel-to-Propagation Consistency Loss

Pixel Propagation Module

For each pixel feature ${\bf x}_i$, the pixel propagation module computes its smoothed transform ${\bf y}_i$ by propagating features from all pixels ${\bf x}_j$ within the same image $Ω$ as