1. Introduction

depth / width / cardinality
- depth : VGG, ResNet
- width : Wide Residual Networks
- cardinality : Xception, ResNeXt
attention
- CBAM (Convolutional Block Attention Module)
- channel & spatial attention을 순서대로 적용
  
  → ‘what’, ‘where’을 배울 수 있음
Main contribution
- effective attention module (CBAM)
- extensive한 ablation study를 통해 attention module 검증
- ImageNet-1K, MS COCO, VOC 2007에 대해 검증

2. Related Work

Network engineering

단순히 모델을 깊게 만드는 것이 좋은 성능을 보임
- Inception network
하지만 gradient propagation의 어려움때문에 saturation된다는 문제가 있음
- ResNet : identity skip-connection을 통해 optimization issue를 해결
- WideResNet, Inception-ResNet, ResNeXt, ...
  - PyramidNet : WideResNet의 generalization
  - ResNeXt : grouped convolution and increasing the cardinality
  - DenseNet

Attention mechanism

Residual Attention Network : encoder-dedocer style attention module
- 이 논문에서는 3d attention map을 한 번에 계산하는 것이 아니라, channel attention과 spatial attention을 분리해서 계산
  - 더 효율적
Squeeze-and-Excitation module
- global average-pooled features
  - GAP를 함으로써 spatial attention이 사라짐
  - ‘where’을 가리키는 특징이 사라짐
- CBAM은 spatial & channel 모두 적용
  - max-pooled feature를 사용하는 것이 더 좋음

3. Convolutional Block Attention Module

intermediate feature map이 주어졌을 때, 1D channel attention map (M_c)와 2D spatial attention map (M_s)를 추론
- element-wise multiplication
  - channel attention은 spatial dimension에 broadcasting되며, 반대도 마찬가지
  - F’’ : the final refined output

$$ \mathbf{F}^{\prime} = \mathbf{M}{\mathbf{c}}(\mathbf{F}) \otimes \mathbf{F}, \\ \mathbf{F}^{\prime \prime} = \mathbf{M}{\mathbf{s}}(\mathbf{F}^{\prime}) \otimes \mathbf{F}^{\prime} $$

d

4. Experiments

5. Conclusions