1.Introduction

2. Proposed Approach

2.1. ArcFace

softmax loss

$$ L_1=-\frac{1}{N}\sum^N_{i=1}\log\frac{e^{W^T_{y_i}x_i+b_{y_i}}}{\sum^N_{j=1}e^{W^T_j x_i+b_j}} \tag{1} $$
- $x_i \in \mathbb{R}^d$ : the deep feature of the $i$-th sample, belonging to the $y_i$-th class
- $d$ (=512) : the embedding feature dimension
- $W_j \in \mathbb{R}^d$ : the $j$-th column of the weight $W \in \mathbb{R}^{d \times n}$
- $b_j \in \mathbb{R}^n$ : the bias term
- $N$ and $n$ : the batch size and the class number
softmax loss는 intra class sample들간 similarity와 inter-class sample들간 diversity를 최적화하지 못함
- $b_j=0$ → $W^T_j x_i = ||W_J|| ||x_i|| \cos\theta_j$ ($\theta_j$ : the angle between the weight $W_j$ and the feature $x_i$)
- $||W_j||, ||x_i|| = 1$로 수정 ($l_2$ normalisation) $||x_i||$는 s로 re-scale
  - 이 작업은 prediction이 feature와 weight간 angle에만 의존하도록 만듬
  - 학습된 embedding feature들이 radius $s$의 hypersphere에 분포하도록 유도
  $$ L_2=-\frac{1}{N}\sum^N_{i=1}\log\frac{e^{s\cos\theta_{y_i}}}{e^{s\cos\theta_{y_i}}+\sum^n_{j=1,j \neq y_i}e^{s\cos\theta_j}} \tag{2} $$
hypersphere에 embedding feature가 분포할 때, $x_i$와 $W_{y_i}$사이에 additive angular margin penalty $m$ 추가 → intra-class compatness와 inter-class discrepancy를 동시에 강화
- additive angular margin penalty가 normalised hypersphere에서 geodesic distance margin penalty와 동일하기 때문에, ArcFace라고 명명
  
  $$ L_3=-\frac{1}{N} \sum^N_{i=1} \log \frac{e^{s(\cos(\theta_{y_i}+m))}}{e^{s(\cos(\theta_{y_i}+m))} + \sum^n_{j=1,j \neq y_i}e^{s\cos\theta_j}} \tag{3} $$
8개의 다른 얼굴이미지를 선택 (1,500 images/class)하여 2D feature embedding network를 학습
- softmax / ArcFace loss 비교

2.2. Comparison with SphereFace and CosFace

Numerical Similarity

SphereFace - multiplicative angular margin $m_1$ ArcFace - additive angular margin $m_2$ CosFace - additive cosine margin $m_3$

Geometric Difference

2.3. Comparison with Other Losses

Intra-Loss

sample과 ground truth center사이 angle/arc를 감소하여 intra-class compactness 개선

$$ L_5=L_2+\frac{1}{\pi N} \sum^N_{i=1} \theta_{y_i} \tag{5} $$

Inter-Loss

다른 center들 사이 angle/arc를 증가시켜 inter-class idscrepancy 강화/math

$$ L_6=L_2+\frac{1}{\pi N (n-1)} \sum^N_{i=1} \sum^n_{j=1,j \neq y_i} \arccos(W^T_{y_i} W_j) \tag{6} $$
Inter-Loss는 Minimum Hyper-spherical Energy (MHE)의 특별 케이스
- hidden layer와 output layer가 MHE로 regularization
- SphereFace loss와 MHE loss를 마지막 레이어에서 합침

1.Introduction

2. Proposed Approach

2.1. ArcFace

2.2. Comparison with SphereFace and CosFace

Numerical Similarity

SphereFace - multiplicative angular margin $m_1$ ArcFace - additive angular margin $m_2$ CosFace - additive cosine margin $m_3$

Geometric Difference

2.3. Comparison with Other Losses

Intra-Loss

Inter-Loss