1. Introduction

이전에 convnet을 이용한 semantic segmentation은 각 pixel을 둘러싸는 물체나 지역으로 구분되지만, 이러한 과정에서 분명한 단점 존재 (낮은 성능, 오래 걸림 등)
- Patchwise training : 효율적이지 않음
Fully Convolution Network (FCN)
- end-to-end, pixels-to-pixels 학습
- pixelwise prediction, supervised pre-training
- input size에 구애받지 않음
- 전체 이미지가 동시에 수행됨
- VGG, Inception with pre-trained for ImageNet 사용
- "skip" architecture

2. Related work

Fully convolutional networks

Sliding window detection
Semantic segmentation
Image restoration

Dense prediction with convnets

Small models restricting capacity and receptive fields
Patchwise training
Post-processing by superpixel projection, random field regularization, filtering, or local classification
Input shifting and output interlacing for dense output as introduced by OverFeat
Multi-scale pyramid processing
Saturating $\text{tanh}$ nonlinearities
Ensembles

3. Fully convolutional networks

Fully convolutional network (FCN)
- nonlinear function을 계산하는 기존 방법들과 달리, nonlinear filter를 계산
- loss function은 spatial dimensions의 합 ($l(\mathbf{x};\theta)=\sum_{ij}l'(\mathbf{x}_{ij};\theta)$) → gradient도 각 spatial component의 gradient의 합

3.1. Adapting classifiers for dense prediction