1. Introduction

Fig 1

Optical flow
- FlowNet, FlowNet 2.0 : pixel-level correspondence를 regressing
Spatial Transformer Network (STN)
- feature map을 spatially transform
- warping operation을 적용하고 warped image를 아웃풋
- multi-linear interpolation으로 off-grid point을 다루기 때문에 back-propagation 가능
Volume Tweening Network (VTN)
- unsupervised training of end-to-end CNNs that perform voxel-level 3D medical image registration
  
  Fig 2
1. cascade the registration subnetworks
2. integrate affine registration
3. incorporate an additional invertibility loss

2. Related Work

3. Method

A. Problem Formulation

To find a displacement field (or flow field) $f_{12}:\Omega\rightarrow\mathbb{R}^3$

$$ I_1(x)\approx I_2(x+f(x)) \tag{1} $$
- $f_{12}$ : $I_1$에서 $I_2$로의 flow
$\text{warp}(I_2,f)(x)=I_2(x+f(x))$ : $f$에 의해 warping된 이미지 $I_2$
- $I_1$과 $\text{warp}(I_2,f)$간 similarity를 최대화하는 것이 목표
  
  $g_1,g_2$로 warping 2번한다면
  
  $$ \begin{aligned} &\text{warp}(\text{warp}(I,g_1),g_2)(x) \\ =&\text{warp}(I,g_1)(x+g_2(x)) \\ =&I(x+g_2(x)+g_1(x+g_2(x))) \\ =&\text{warp}(I,g_2+\text{warp}(g_1,g_2))(x) \tag{2} \end{aligned} $$
- the composition of two flows
  
  $$ g_1 \star g_2=g_2+\text{warp}(g_1,g_2) \tag{3} $$
  
  $$ \text{warp}(\text{warp}(I,g_1),g_2)=\text{warp}(I,g_1\star g_2) \tag{4} $$
- STN처럼 trilinear interpolation으로 enclosing cuboid를 이어감
- 이미지 바깥 index는 nearest-point interpolation
→ 격자에 있는 어떤 점 $x$에 대해, 가장 가까이 있는 enclosing cuboid로 $x$를 옮기고 8 nearest lattice points들에 대해 interpolation

B. Unsupervised End-to-End Registration Network

Volume Tweening Network (VTN)
- 여러 cascaded registration subnetwork로 구성
- fixed image와 각 warped image간 dissimilarity를 regularization loss로 계산
warping operation은 input image와 input flow field를 trilinear interpolation으로 미분가능하게 함
deformable image registration은 dense flow field를 예측하기 전에 초기 rigid transformation을 global alignment로서 적용
- ANTs, VoxelMorph처럼 preprocessing stage에 시간을 쏟는 대신, 이런 과정을 top-level subnetwork에 통합
- affine registration subnetwork는 affine parameters를 예측하고, 이것들로 flow field를 생성
- 통합된 affine registration subnetwork의 running time은 무시해도 될 정도이고, 기존 affine stage보다 좋은 성능을 냄

C. Loss Functions

unsupervised 방식으로 학습을 진행하기 위해, spatial transformer로 warping된 moving image와 fixed image 사이의 (dis)similarity를 계산
- Regularization loss
  
  : flow field가 unrealistic or overfitting되는 것으로 부터 방지
  - a) Correlation Coefficient as the similarity measurement
    
    $$ \text{Cov}[I_1,I_2]=\frac{1}{|\Omega|}\sum_{x\in\Omega}I_1(x)I_2(x)-\frac{1}{ |\Omega|^2}\sum_{x\in\Omega}I_1(x)\sum_{y\in\Omega}I_2(y) \tag{5} $$
    - $\Omega$ : the cuboid (or grid) on which the input images are defined
    $$ \text{CorrCoef}[I_1,I_2]=\frac{\text{Cov}[I_1,I_2]}{\sqrt{\text{Cov}[I_1,I_1]\text{Cov}[I_2,I_2]}} \tag{6} $$
    - the range of correlation coefficient : [-1, 1]
      - 두 이미지가 얼마나 선형적으로 관련이 있는지
    - 영상에 non-degenerate linear function을 적용해도 correlation coefficient는 변하지 않기 때문에, $L_2$ loss를 적용하는 것보다 robust
    - real-world image에서 correlation coefficient는 한 이미지가 negative film이 아니라면 항상 non-negative
    - The correlation coefficient loss
      
      $$ L_\text{Corr}(I_1,I_2)=1-\text{CorrCoef}[I_1,I_2] \tag{7} $$
  - b) Total Variation Loss (Smooth Term) as the regularization term for dense flow predictions
    
    $$ L_\text{TV}=\frac{1}{3|\Omega|}\sum_x\sum^3_{i=1}(f(x+e_i)-f(x))^2 \tag{8} $$
    - $e_{1,2,3}$ : the natural basis of $\mathbb{R}^3$
      - total variation loss의 초기 정의에서는 다양한 값을 갖지만, 이 식에서는 loss term으로서 좀 더 자연스러움 (L2 regularization)
- Orthogonality loss and Determinant loss as the regularization terms for the affine registration subnetwork
  
  : gradient가 너무 커지는 것을 막고 학습 안정성 확보
  - c) Orthogonality Loss
    - affinely align되기 전에 small scaling과 rotation만 필요로 함
    - overly non-rigid transform을 생성하는 것에 대해 penalize
      
      → a loss on the non-orthogonality of $I+A$
      - $I$ : the identity matrix
      - $A$ : the transform matrix produced by the affine registration network
      $$ L_\text{ortho}=-6+\sum^3_{i=1}(\lambda^2_i+\lambda^{-2}_i) \tag{9} $$
      - $\lambda_{1,2,3}$ : the signular values of $I+A$
      - 모든 singular value가 1이면 matrix는 orthogonal
      - $I+A$가 orthogonal matrix에서 벗어날수록, orthogonality loss는 증가
      - $I+A$가 orthogonal이면 그 값들은 0이 됨
    - orthogonality loss를 계산하는 것은 $I+A$의 singular value와 관련이 있음
      - singular value들의 제곱은 $(I+A)^\text{T}(I+A)$의 eigenvalue와 정확히 같음
      - orthogonality loss는 이 eigenvalue들의 symmetric function이기 때문에, Viete's theorem에 의해 $(I+A)^\text{T}(I+A)$의 고유다항식의 coefficient와 관련된 부분으로 정의할 수 있음
      - 이는 직접적으로 미분이 가능함
  - d) Determinant Loss
    - 이미지들은 같은 비대칭성(chirality)을 가지고 있다고 가정
      
      → reflection과 관련된 affine transform은 허락되지 않음
      - $\text{det}(I+A)>0$ 이어야 함
      $$ L_\text{det}=(-1+\text{det}(A+I))^2 \tag{10} $$

4. Network Architecture

A. Cascading

Fig 2