softmax loss
$$ L_1=-\frac{1}{N}\sum^N_{i=1}\log\frac{e^{W^T_{y_i}x_i+b_{y_i}}}{\sum^N_{j=1}e^{W^T_j x_i+b_j}} \tag{1} $$
softmax loss는 intra class sample들간 similarity와 inter-class sample들간 diversity를 최적화하지 못함
$b_j=0$ → $W^T_j x_i = ||W_J|| ||x_i|| \cos\theta_j$ ($\theta_j$ : the angle between the weight $W_j$ and the feature $x_i$)
$||W_j||, ||x_i|| = 1$로 수정 ($l_2$ normalisation) $||x_i||$는 s로 re-scale
$$ L_2=-\frac{1}{N}\sum^N_{i=1}\log\frac{e^{s\cos\theta_{y_i}}}{e^{s\cos\theta_{y_i}}+\sum^n_{j=1,j \neq y_i}e^{s\cos\theta_j}} \tag{2} $$
hypersphere에 embedding feature가 분포할 때, $x_i$와 $W_{y_i}$사이에 additive angular margin penalty $m$ 추가 → intra-class compatness와 inter-class discrepancy를 동시에 강화
additive angular margin penalty가 normalised hypersphere에서 geodesic distance margin penalty와 동일하기 때문에, ArcFace라고 명명
$$ L_3=-\frac{1}{N} \sum^N_{i=1} \log \frac{e^{s(\cos(\theta_{y_i}+m))}}{e^{s(\cos(\theta_{y_i}+m))} + \sum^n_{j=1,j \neq y_i}e^{s\cos\theta_j}} \tag{3} $$
8개의 다른 얼굴이미지를 선택 (1,500 images/class)하여 2D feature embedding network를 학습
softmax / ArcFace loss 비교
sample과 ground truth center사이 angle/arc를 감소하여 intra-class compactness 개선
$$ L_5=L_2+\frac{1}{\pi N} \sum^N_{i=1} \theta_{y_i} \tag{5} $$
다른 center들 사이 angle/arc를 증가시켜 inter-class idscrepancy 강화/math
$$ L_6=L_2+\frac{1}{\pi N (n-1)} \sum^N_{i=1} \sum^n_{j=1,j \neq y_i} \arccos(W^T_{y_i} W_j) \tag{6} $$
Inter-Loss는 Minimum Hyper-spherical Energy (MHE)의 특별 케이스