classification과 localization의 spatial sensitivity
spatial dimension에서 misalignment는 성능향상에 제한이 있음
the classicial Faster RCNN
$$ \mathcal{L} = \mathcal{L}_{cls}(\mathcal{H}1(F_l, P), y) + \mathcal{L}{loc}(\mathcal{H}_2(F_l, P), \mathcal{B}) \tag{1} $$
head-decoupling이 성능은 개선하지만, spatial dimension에서 각 task가 겹쳐서 문제가 발생할 수 있음
$$ \mathcal{L} = \mathcal{L}^D_{cls} (\mathcal{H}^D_1 (F_l, \hat{P}c), y) + \mathcal{L}^D{loc}(\mathcal{H}^D_2 (F_l, \hat{P}_r), \mathcal{B}) \tag{2} $$
TSD는 $P$의 RoI feature를 input으로 수행하고, disentangled proposal $\hat{P}_c$와 $\hat{P}_r$을 각각 생성
Localization
$\mathcal{F}_r$ : 새로운 $\hat{P}_r$을 생성하기위해 $P$에서 proposal-wise translation 생성
$$ \Delta R = \gamma \mathcal{F}_r(F;\theta_r) \cdot (w,h) \tag{3} $$
$$ \hat{P}_r = P + \Delta R \tag{4} $$
Classification
불규칙적인 shape의 derived proposal $\hat{P}_c$을 생성하기위해 regular grid $k \times k$에서 pointwise deformation
(x,y)-th grid에서 $\hat{P}_c$의 새로운 sample point를 얻기위해 translation $\Delta C(x,y,*)$
$$ \Delta C = \gamma \mathcal{F}_c(F;\theta_c) \cdot (w,h) \tag{5} $$
$\mathcal{F}_r$과 $\mathcal{F}_c$의 첫 번째 레이어는 parameter를 줄이기 위해 공유됨
irregular $\hat{P}_c$에서 feature map $\hat{F}_c$을 생성하기위해 deformable RoI pooling과 같은 연산 진행
$$ \hat{F}c(x,y) = \sum{p \in G(x,y)} \frac{\mathcal{F}_B(p_0 + \Delta C(x,y,1),\ p_1 + \Delta C(x,y,2))}{|G(x,y)|} \tag{6} $$
progresseive constraint (PC)
classification branch
$$ \mathcal{M}_{cls} = |\mathcal{H}1 (y|F_l,P) - \mathcal{H}^D_1(y|F_l, \tau_c(P,\Delta C)) + m_c|+ \tag{7} $$
localization branch
$$ \mathcal{M}_{loc} = |IoU(\hat{\mathcal{B}}, \mathcal{B}) - IoU(\hat{\mathcal{B}}D, \mathcal{B}) + m_r|+ \tag{8} $$
whole loss function of TSD with Faster RCNN
$$ \mathcal{L} = \underbrace{\mathcal{L}{rpn} + \mathcal{L}{cls} + \mathcal{L}{loc}}{classical\ loss} + \underbrace{\mathcal{L}^D_{cls} + \mathcal{L}^D_{loc} + \mathcal{M}{cls} + \mathcal{M}{loc}}_{TSD\ loss} \tag{9} $$