1. Introduction

2. Related work

predicted objects와 ground truth objects를 bipartite matching → object specific loss를 최적화

$$ \begin{aligned} \hat{\sigma}=\argmin_{\sigma\in\mathfrak{S}N}\sum^N_i\mathcal{L}{\text{match}}(y_i,\hat{y}_{\sigma(i)}) &(1) \end{aligned} $$
- $y$ : the ground truth
- $\hat{y} = \{\hat{y}i\}^N{i=1}$ : the set of $N$ predictions
- $N$ elements $\sigma \in \mathfrak{S}_N$의 permutation을 최소 비용으로 탐색
- $\mathcal{L}{\text{match}}(y_i,\hat{y}{\sigma(i)})$ : a pair-wise matching cost between ground truth $y_i$ and a prediction with index $\sigma(i)$
- $\mathcal{L}{\text{match}}(y_i,\hat{y}{\sigma(i)})=-\mathbb{1}{\{c_i\neq \varnothing\}}\hat{p}{\sigma(i)}(c_i)+\mathbb{1}{\{c_i\neq \varnothing\}}\mathcal{L}\text{box}(b_i,\hat{b}_{\sigma(i)})$
  - $y_i = (c_i, b_i)$
    - $c_i$ : the target class label (which may be $\varnothing$)
    - $b_i \in [0, 1]^4$ : a vector that defines ground truth box center coordinates
  - $\hat{p}_{\sigma(i)}(c_i)$ : probability of class $c_i$
  - $\hat{b}_{\sigma(i)}$ : the predicted box
Hungrian loss

$$ \begin{aligned} \mathcal{L}\text{Hungrian}(y,\hat{y})=\sum^N{i=1}[-\log\hat{p}{\hat{\sigma}(i)}(c_i)+\mathbb{1}{c_i\neq\varnothing}\mathcal{L}\text{box}(b_i,\hat{b}{\hat{\sigma}}(i))] &(2) \end{aligned} $$
- object와 $\varnothing$사이의 matching cost는 prediction에 종속되지 않음 → 이 경우 cost는 상수

a linear combination of the $l_1$ loss and the generalized IoU loss $\mathcal{L}_\text{iou}(\cdot,\cdot)$ that is scale-invariant

$$ \mathcal{L}\text{box}(b_i,\hat{b}{\sigma(i)})=\lambda_\text{iou}\mathcal{L}\text{iou}(b_i,\hat{b}{\sigma(i)})+\lambda_{\text{L}1}||b_i-\hat{b}_{\sigma(i)}||_1 \tag{3} $$

$x_{img}\in\mathbb{R}^{3\times{H_0}\times{W_0}}$ : the initial image
$f\in\mathbb{R}^{C\times H\times W}$ : a lower-resolution activation map
- $C=2048$ and $H,W=\frac{H_0}{32},\frac{W_0}{32}$

1x1 convolution으로 high-level activation map $f$의 채널을 $C$에서 $d$로 감소 → 새로운 feature map $z_0 \in \mathbb{R}^{d \times W \times W}$생성
- $d \times HW$ feature map으로 만들면서 $z_0$의 spatial dimension을 1차원으로 변환