Two Concept

Spatial sensitivity

the ability to discriminate spatially close pixels, needed for accurate prediction in boundary areas where labels change.
Spatial smoothness

encourages spatially close pixels to be similar, which can aid prediction in areas that belong to the same label.

Method

PixContrast

Spatial sensitivity

sampling two augmentation views from the same image. The two views are both resized to a ﬁxed resolution (e.g., 224 × 224)
passing through a regular encoder network (backbone and projection head) and a momentum encoder network
- projection head consists of two 1x1 conv (2048 → 256) acting as fc
- output size (for pytorch, channel first) is [B, 256, 7, 7] a feature map.
warp feature map to original image, compute normalized (compensate scaling in aug) all pair distances; d ≤ threshold ⇒ positive pair; d > threshold ⇒ negative pair
contrastive loss (logarithmic softmax cosine similarity/dissimilarity)

$$ \mathcal{L}{\text {Pix }}(i)=-\log \frac{\sum{j \in \Omega_{p}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{j}^{\prime}\right) / \tau}}{\sum_{j \in \Omega_{p}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{j}^{\prime}\right) / \tau}+\sum_{k \in \Omega_{n}^{i}} e^{\cos \left(\mathbf{x}{i}, \mathbf{x}{k}^{\prime}\right) / \tau}} $$

the loss is averaged over all pixels on the first view that lie in the intersection of the two views. the loss of pixels on second view is also averaged.

The final loss is the average over all image pairs in a mini-batch.

PixPro

Spatial sensitivity & Spatial smoothness

Same as PixContrast
Same as PixContrast
before warp feature map to original image, one branch go through PPM
Pixel-to-Propagation Consistency Loss

Pixel Propagation Module

For each pixel feature ${\bf x}_i$, the pixel propagation module computes its smoothed transform ${\bf y}_i$ by propagating features from all pixels ${\bf x}_j$ within the same image $Ω$ as