Brief Introduction
- Indicate the "essential difference" between anchor-based and anchor-free detectors
- How to define positive and negative training samples
- Propose an "adaptive training sample selection" to automatically select positive and negative training samples
- Demonstrate tiling multiple anchors per location is useless
- Achieve SOTA performance on MS COCO without overhead
Recent vision models
anchor-based & anchor-free
- Mainly two methods for vision processing
- Anchor-based method: single-stable & double-stage
- Anchor-free method: keypoint-based & center-based
Anchor-based
- Double-stage: Faster R-CNN
- Consists of Region Proposal Network(RPN) & region-wise prediction network (R-CNN)
- Good accuracy with anchor refinement
- Single-stage: Single Shot Detection (SSD)
- High computational efficiency
Anchor-free
- Keypoint-based method: CornerNet
- First locates several pre-defined or self-learned keypoints
- Then, generates bounding boxes
- Center-based method: YOLO
- Regards the center of object as foreground to define positives
- Then, predicts the distances from positive to the four sides of the object bounding box
Difference Analysis between two
- RetinaNet (anchor-based) vs. FCOS (anchor-free)
- one-stage anchor-based & center-based anchor-free
- Attention points
- The positive/negative sample definition
- The number of anchors tiled per location
- Dataset: MS COCO (80 object classes)
- RetinaNet (#A=1) → one square anchor box per location
Inconsistency removal
