1. Introduction
-
이전에 convnet을 이용한 semantic segmentation은 각 pixel을 둘러싸는 물체나 지역으로 구분되지만, 이러한 과정에서 분명한 단점 존재 (낮은 성능, 오래 걸림 등)
- Patchwise training : 효율적이지 않음
-
Fully Convolution Network (FCN)
- end-to-end, pixels-to-pixels 학습
- pixelwise prediction, supervised pre-training
- input size에 구애받지 않음
- 전체 이미지가 동시에 수행됨
- VGG, Inception with pre-trained for ImageNet 사용
- "skip" architecture

2. Related work
Fully convolutional networks
- Sliding window detection
- Semantic segmentation
- Image restoration
Dense prediction with convnets
- Small models restricting capacity and receptive fields
- Patchwise training
- Post-processing by superpixel projection, random field regularization, filtering, or local classification
- Input shifting and output interlacing for dense output as introduced by OverFeat
- Multi-scale pyramid processing
- Saturating $\text{tanh}$ nonlinearities
- Ensembles
3. Fully convolutional networks
- Fully convolutional network (FCN)
- nonlinear function을 계산하는 기존 방법들과 달리, nonlinear filter를 계산
- loss function은 spatial dimensions의 합 ($l(\mathbf{x};\theta)=\sum_{ij}l'(\mathbf{x}_{ij};\theta)$)
→ gradient도 각 spatial component의 gradient의 합
3.1. Adapting classifiers for dense prediction